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Preface 



The scope of reliability engineering is extremely wide, encompassing many 
areas of engineering technology Reliability engineering helps ensure the 
success of space missions, maintain the national security, deliver a steady 
supply of elcctnc power, provide reliable transportation, and so on. There 
has been a considerable growth of knowledge in several areas of reliability 
engineering and its applications. These areas are characterized either by 
specific methodologies — fault trees, for example that have found appli- 
cations across various disciplines, or by topics that have developed a 
structure of their own, like power system reliability. New definitions, 
concepts, and techniques have been developed in these areas, and the 
knowledge of generic reliability theory alone is not enough for the appreci- 
ation of these ideas. 

Reliability engineers deal with projects relating to various disciplines or 
with discrete aspects of a complex project and need the knowledge of 
diverse topics. An engineer needing information in ihcse areas generally 
faces a great deal of difficulty and inconvenience, since these topics are 
discussed in various technical papers and in specialized books but have not 
been treated within the framework of a single book. This book is intended 
lo fulfill the need for a single volume thai considers these diverse topics. In 
this book topics of current interest are treated in such a manner that the 
reader needs no previous knowledge to understand the contents. We have 
tried to focus more on the structure of the concepts than the minute 
details. References to relevent literature are provided for the reader who 
wants to delve more deeply into particular topics. 

The first chapter reviews the role and importance of reliability engineer- 
ing in the planning and design process and outlines the scope of the book. 
Chapter 2 reviews the basic probability theory and other pertinent 
mathematical topics. Fundamental concepts and reliability techniques are 
described in Chapter 3. For readers not familiar with the basic concepts of 
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reliability theory, these two chapters provide sufficient background For 
understanding this book. 

Subsequent chapters deal with important techniques and specific areas 
of application. These chapters are self-contained and the reader with some 
background in reliability can understand them without referring to 
Chapters 2 and 3. Readers new to this area should find Chapters 2 and 3 
helpful. 

Chapter 4 presents the important and useful techniques of fault-tree 
analysis and common-cause failures. These two topics have been of con- 
siderable interest in recent years. Software reliability is discussed in Chapter 
5, which describes the models and techniques For assessing, and enhancing 
software reliability. The commonly used models and techniques for studies 
of mechanical and human reliability are presented in Chapters 6 and 7. 
respectively. Chapter 8 contains the reliability evaluation techniques and 
models for networks comprised of devices with two mutually exclusive 
fail ure modes. Markov models of repairable components arc also described 
in this chapter. 

Chapters 9 to 1 1 present three significant areas of application, electric 
power systems, transit systems, and computer systems. These areas or 
application have attracted a considerable amount of attention and have 
seen a substantial growth of knowledge. The reader will find a certain 
commonality of concepts but a great diversity in definitions, models, and 
methods. 

The book is intended primarily for engineers, managers, graduate stu- 
dents, and other professionals interested in the subject of reliability. It can 
be adopted for a variety of graduate or short professional courses. A 
general course in reliability engineering would focus on Chapters 2 to 4. 7. 
8. and selected portions of the remaining chapters. A course in power 
systems reliability could be based on Chapters 2. 3. and 9. Chapters 2, 3, 
and 10 could be used for a course on the reliability of urban transporta- 
tions systems; and a computer systems reliability course would use Chapters 
2,3. 5. and II. 

Our experience on many projects and environments, teaching, and 
exposure to several outstanding experts in this area filler through the pages 
of this book. We would specifically like to thank our Former colleagues and 
fellow professionals at Ontario Hydro and the Ontario Ministry of Trans- 
portation and Communications, and our present colleagues at the Univer- 
sity of Ottawa and Texas A & M University, as well as many other 
professionals who. through discussion and writing, have influenced our 
thinking. We would also like to thank the Department of Electrical 
Engineering, Texas A & M University, for assistance during the prepara- 
tion of the manuscript. 
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We thank our wives, Rosy Dhillon and Gurdeep Singh, for their pa- 
tience and ever present help dunng the preparation of the manuscript, and 
we appreciate the support and encouragement of our parents throughout. 

B. S. Dmu-ON 
C. Singh 



Ottawa, Ontario 
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Introduction 



l.l RELIABILITY ASSURANCE 

4 • Reliability is an imporlar iL consideration in the planning, design, and 
operation of systcrn^ JWlcJiave always expected trains 10 be on time, 
electric power not to fail. and so on. Before (he Second World War, the 
concept or reliability had been only intuitive, subjective, and qualitative. 
The concept of quantitative reliability appears to have had its inception 
during the Second World War. and continues today, required by the sue 
and complexity of modern systems 

The modern discipline or reliability is distinguished from the old concept 
by quantitative evaluation versus the older qualitative evaluation. When 
reliability is defined quantitatively, it is specified, analyzed, and measured 
and becomes a parameter of design that can be traded off against other 
parameters such as cost and performance 1 1], 

" The modern discipline o f reliability had tU^ oripins in the military and 
space technology lis influe nee has been steadily spreading into many 
other applications. This again is due to the growing complexity of systems, 
competitiveness in the market, and an ever-increasing competition for 
budget and reso urces ,* Neither can u n rc I ia bi 1 1 iy be tolerated nor are over 
designs permiss ible in todays m arket The cost of failures in modern power 
systems and urban transportation system tjpw much beyond the cost of 
repair or replacement of effecte d parts. The- inconvenience t ,o c onsumes" ^ 
and cumrnj^igrs, lost j i roduc ts. crime , and decreased productivity cost 
much more than thep rteZflCiD mcdiate repairs .^ 

U PLANNING AND DESIGN 

Quantitative reliability can play an important role in the planning, design, 
and operation of any system. As an example, consider transit facilities 
being planned for a city. The reliability characteristics of the vehicles and 
other equipment should be considered at an early stage [2]. The number of 
vehicles on scheduled maintenance and the number of vehicles that require 
service by failures should be allowed for while estimating the Heel size. A 
qua ntitative and consistent appi -n^h »rng|d, n< Mjo decide the level o f 
service reliability with which the demand should be met and then develop 
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a syslem reliability model utilizing failure rate data |2] to estimate the 
number of spare vehicles. 

Another approach considers reliability both as a constituent cost and as 
an effectiveness constraint in assessing the total cost of system acquisition 
and ownership, that is, life cycle costing. There is a financial penalty 
associated with vehicle failures since they must be repaired and other 
vehicles must be available in reserve to maintain the required service level. 
Also if the syslem reliability is not adequate, it can lead to loss of revenue 
becau se of reduced ridcrship . On the~oth~ef hlnidTit generally co sts more 
money to build higher reliability into the system. Theref ore, a trade off can 
be made between cost and reliability:! \ 
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The inherent level of reliability is built into a syslem duringjts design 
pKHc [3 ]. LflCli oT c ontrol arTtT direction during this period can result in 
costly retrofits or poor service reliability during the life cycle of the system 
[2]. The role of rcliabililv engineering during the design process [Z\ is 
indicated in Figure 1 . 1 . 



13 TECHNIQUES AND APPLICATIONS 

The body of knowledge regarding the theory and practice of reliability has 
been steadily growing. Not only has the basic reliability theory [3| become 
more sophisticated, but relatively new techniques have been developed and 
the areas of application considerably expanded. Techniques like fault trees 
have found applications across various disciplines. The topic of three-slate 
devices is of great interest to reliability engineers in various disciplines. 
Software reliability, mechanical reliability, and human reliability may have 
borrowed some concepts from traditional hardware reliability but have 
distinguishing characteristics and concepts of their own. The concept of an 
error or a bug, for example, is different from the hardware failures and 
mechanical reliability is uniquely based on interference models. The areas 
of application like computers, power systems, and transit systems have 
developed their own definitions, concepts, and techniques. There is a 
certain commonality of concepts but great diversity in definitions, models, 
and methods. The forced outage rale, for example, when applied to 
generating umis means the unavailability of the unit. The degree of growth 
of knowledge in particular areas can be seen by the fact that there are at 
present three books in English alone on power system reliability. 



IA SCOPE 

Engineers today dealing with large and diverse projects require informa- 
tion on reliability as it affects differing systems. These topics arc discussed 
cither in various technical papers or in specialized books and are presently 
not treated within the framework of a single book. An engineer needing 
information in these areas generally faces a great deal of difficulty. This 
book is an attempt to fulfill this need by treating these diverse topics in a 
single volume. Previous knowledge is not necessary to understand the 
contents, since two chapters on basic reliability theory are provided to give 
enough background. This book will find application in many disciplines 
and will be especially useful to reliability engineers, system engineers, and 
students of reliability. 
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Reliability Mathematics 



2.1 INTRODUCTION 

This chapter presents some basic mathematical concepts needed to under- 
stand subsequent chapters. Such topics as set theory, discrete and contin- 
uous random variables, probability distributions, hazard plotting, and 
differential equations arc discussed briefly and provide an overview of the 
subject. The reader requiring in-depth knowledge of these concepts should 
consult references I and 7 



2,2 SET THEOKY 

Sets are normally represented by capital letters such as X. Y, Z. Elements 
are denoted by the lower case letters such as c, d. e. 

If k is an element of sel B. then it is denoted as: Aefi and its negation is 
denoted as k g B. If X is a subset of set Y it is written as 

XcY or Yz>X (2.1) 

The negation of the above is written as 

X£Y or YnX (2.2) 

If two sets arc equal (suppose each set belongs to the other) they are 
expressed as 

r-y (2.3) 

The statement (2.3) is true if only 

XcY and YcX (2.4) 

2.2.1 Union of Sett 

The union of sets is denoted by the symbol u or +. For example if 
X+Y**Z, il means that all the elements in set X or in set Y or in both sets 
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Kijfurc 2.1 Venn ditgrmm for the union of wis X , Y. 



X and Y arc contained in set Z. The statement 

Z-X+Y (2.5) 

may also be written as Z=X u K 

This case may be represented on the Venn diagram as shown in f igure 
2.1. 

2.2.2 intersection of Sets 

The intersection of sets is denoted by n or dot (■)■ For example, if the 
intersection of sets or events C and D is represented by a third set. say 7", 
then this set contains all elements which belong to both C and D. ft is 
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F%n U Venn diagram for mlcncirlion. 



Set Thnm 




denoted as 

T=CnZ) or T-CD (2.6) 

The above expression is shown on [he Venn diagram in Figure 22. 

IT the intersection of sets C and D is zero then sets C and Z? are called 
mutually exclusive or disjoint sets. This may be represented on Venn 
diagram as shown in Figure 23. 

12. J Basic Lam of Boolean Algebra 
Some laws of Boolean algebra arc as follows: 

1. Distributive laws 

X(Y+Z)~(XY) + (XZ) (2.7) 
X+{YZ)-(X+Y)(X+Z) (2.8) 



2. Boolean identities 



3. Absorption laws 



X+X^X (2.9) 

X X-X (2.10) 

X + {XY)-X (2.11) 

X(XY)~X-Y (2.12) 
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23 PROBABILITY THEORY 

Probability theory may be defined as the study of the random experiments. 
The most important event-related properties of probability are as follows: 

L For each event X, the event probability is 

Q<P(X)<] (2.13) 

2. In the case of mutually exclusive events, say jr,,Jfj,jr 3 x„ the 

probability of union of events is given by 

^ > (.* l +;tj + jr 3 + ...+ x„)"P(xi) + P{x 3 ) 

+ P{x,)+... + P(x lt ) (214) 

3. The union of n events is given by 

P( Xl +.i 3 + ... +*„)- [P(x l ) + P(x 1 )+ ... + P(x M )) 

- { P{ x x x 2 ) + P(x,x, )+...+ H*j*t*M ) } + . .. 
+ [-\r i {P(x l x 2 x J ...x n )} (2.15) 

For example, in the case of two statistically independent events x t and 
x a, the probability expressions becomes: 

#>(*,+*,)-*(*,) + /*<*! )-PiXt )P(x 1 ) (2,16) 

4. Probability of the sample space 5 is always equal to unity, that is. 

P(S)=\ (2.17) 
The negation of the sample space S is written as 5. Thus 

P{S\ = 0 (2.18) 

5. The n events intersection probability expression is as follows 

P( . . . x„ ) - P( jc, )P( JtjA, ) . . . P{xjx,x 2 . . . *„_ , ) (2,19) 

where P{Xj/jc,) implies probability of v 2 given x v 

If all the events are statistically independent ihc above expression 
becomes 

P(x,x lXj ... x H )~P(x,)P{ Xl )P{ Xi )... P(x m ) (220) 



fUpviotn Variables 9 

6. The events X and Y are said to be independent, if and only if 

P{XY)=P{X)P(Y) (2.21) 



If events X and ) cannot satisfy the above relationship, then Ihese 
events arc said to be dependent. The conditional probability of x n , 
given that the events Jt lt Jfj. *„_ , have occurred is obtained by 

the following relationship: 

Pixjx, *„_,)- * x t< x *-*> *«> (2 22) 

"I - 1 ] t x 7 • x J ■ ■ ■ ■ t X K - ] ) 

2A RANDOM VARIABLES 

Random variables may be discrete or continuous. Both discrete and 
continuous variables and ihe associated probability distributions are de- 
scribed in these sections. 

2.4.1 Discrete Random Variables 

tf Y is a random variable on Ihe sample space S along with a counlably 
infinite set Y(S)=- [y t , y 2 , y lt ... }, then these random variables along with 
other finite sets arc known as discrete random variables. 

Density Function. For a single-dimension discrete random variable Y, the 
discrete probability function of the random variable Y is represented by 
/{>>,) if the following conditions hold: 

f(y l )>0 for all y,£R y (range space) (2.23) 

and 

£/{*i>-1 <2.24> 

Cumulative Probability Distribution Function. The cumulative probability 
distribution function is defined as 

ny)- 2 f{y,) (2.25) 

where F\y) is the cumulative probability distribution function. 

Furthe rmore. the area under the probability density function curve is 
always 



0<H>')<1 
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Binomial Distribution. The binomial distribution is a frequently used 
distribution in reliability engineering. This is also known as the Bernoulli 
distribution. Wc arc often concerned with the probabilities of outcome 
such as the total number of failures in a sequence of n trials. For this 
distribution, each trial has two possible outcomes, success and failure, 
where the probability of each trial remains constant. 
The binomial probability function fix) is defined as 

/(■*> = ^Jl^^ V-, x- 0,1,2 n (2.27J 

where x =thc number of failures in n trials 

P = the single trial probability of success 
9 = thc single trial probability of failure 

It is always true that the summation of probability of failure and success 
for each trial is always equal to unity (i.e., p + q — J ). 

The probability of x or less failures in n number of trials is known as the 
probability distribution function. Fix), /•*. 

where f , J ™n t/j !(n- 1 )\. 

Poissvn Distribution. This distribution model is used in reliability studies 
when one is interested in the occurrence of a number of events that are of 
the same kind. Occurrence of each event is represented as a point on a 
time scale. In reliability engineering each event represents a failure. The 
Poisson density function is defined as 

/(«)"- — P '-, n=0,l,2 (2.29) 

where t is the time and A is the constant failure or arrival rate. 
The cumulative distribution function F is given by 

F= £ a0 ' e *P ( -* fi (2.30) 

t-Q I ■ 

Multinomial Distribution. This distribution is applicable to those cases 
where a system or device has more than two slates. This is an extension of 
the binomial distribution which is only applicable lo systems or devices 
with two states. The multinomial distribution probability function is de- 
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fined as follows: 



PpP 2 "P 3 "...P;- (2.31) 



for 



V P = ] 
I - I 

n 

- 1 



0<P<1 



2.4.2 Continuous Random \'ariah!i>\ 

A real-valued function defined over a sample space S is called a continu- 
ous random variable. In the case of the continuous random variable, the 
probability density function is defined as 



/<')■ 



dFU\ 
dt 



(2J2) 



where 



(2.33) 



and 



F(eo) = l 



F{ i) is called the distribution function of a continuous random variable 
f. The probability distributions of the continuous random variable are as 
follows: 

Uniform Distribution. This is a continuous distribution whose probability 
density /(f) and distribution functions F(/), respectively, are defined as 
follows: 

/(f)-— ^ 8<Ka (2.34) 

otherwise 
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and 

I t>ct 
-^-4 8<r«x 

Exponential Distribution. This is a widely used distribution in reliability 
engineering [2|. Il is one of the simplest distributions to perform reliabiluy 
analysis. The exponential probability density function fit ) is defined as 
follows: 

f{t )=Ae~*' />0 \>0 (2.36) 

where A is a constant failure rale and t is time. 
The cumulative distribution function given by 

f-(l)=l-f-*' (2.37) 

Weibull Distribution. This distribution is due lo Weibull [8], This distribu- 
tion can represent many different physical phenomena. Weibull distribu- 
tion is a three parameters distribution whose probability density function is 
defined as follows: 

/{,)■-(,_<,)*-'<.- «'-«>*AJ for t>a b t a,a>0 (238) 
n 

where b f n. and a are shape, scale, and location parameters, respectively. 
The distribution function is given by 

i _,-(«-■>*/•) f or i> a n.b>0 a>Q (2.39) 

Rayleigh Distribution. This distribution has its applications in the theory 
of sound and reliability engineering. The Rayleigh distribution is a special 
case of the Weibull distribution (6 — 2, a->0). Therefore, the probability 
density and distribution functions may be directly obtained from (2.38) 
and (2.39), respectively, as follows: 

/(0--^" J,/ " '>0 n>Q (2.40) 
n 

F(t)~]-e-'' /m (2.41) 



Gamma Distribution. This distribution is an extension of the exponential 
distribution. Some of its applications are found in life test problems. 



Probability density and distribution functions are 

/(,) " A< r(a) t>0 X * a>0 ( 2 - 42 ) 

and 

f(0=l-2 ^ t>0 \, a >0 (2.43) 

In the case of n= I, this distribution reduces to exponential form. 

Extreme Vatue Distribution. It is a good representative of the failure 
behavior of mechanical components. Probability density and distribution 
function of the extreme value distribution are as follows: 

/(/ )=*■'<•-'' -«X f <w (2.44) 

and 

F(l)=\-e r ' -Qo<f<oc (2.45) 

Normal Distribution (Gaussian). This is a two-parameter distribution, 
which also has its applications in the reliability field. Its probability density 
function is defined as follows: 

I it 1^ 

f{')= f-'^i — -oo</<co a>0 -oo <u.< oo 

V'2v a 

(2.46) 

The cumulative distribution function is 

m<*-Z=- f ^H^f* (2.47) 

The numerical values of ihe cumulative function (2.47) may be obtained 
from the standard tables. 



Log Normal Distribution. This is another distribution often used to repre- 
sent the repair times of failed equipment. The probability density and 
distribution functions are 

/(')- l —— e -lW-i-H*/f' For t>a>0 a >0 

{t-a)V2vo 

(2.48) 



I* 

and 
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F(r)= f ± <- V & for r>0 (2 



49) 



Beta Distribution. The beta distribution is a two-parameter distribution 
finding some uses in reliability engineering. The probability density func- 
tion of this distribution is defined as follows: 

for 0<f<l y>-1 B>-1 (2.50) 
The cumulative distribution function is given by 

f' (T+ Sl'- ^'-^» for 0«<l (2^1) 

Tfce General Distribution ( Hazard-Rate Model). This section presents a 
general distribution [3] which might be useful to represent failure behavior 
of items that are not adequately represented by the existing failure distri- 
butions. 

The hazard rate A(f) and reliability function Rit) are defined by 

\{t)~kXct c - i +{\-k)bt''- , B^'' 
for b,c,B,\>0 0<*<l t>0 (2.52) 



and 



*(/) = exp[ -k\i e - (1-Jt )(**'*- I)] (2.53) 



where b , c - shape para m eters 
B, A = scale parameters 
f — time 

In special cases, the above distribution becomes 

c - 1 , £>- I Makeham distribution 

k = 0, b= 1 extreme value 

A=l Weibull 

c— 0.5, A — 1 bathtub curve 
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The Hazard Rate Model Distribution, The hazard rate function K(t) [4] of 
this model is defined as follows: 



A( t ) = k A tanh \t + ( 1 - Jt )Ar* _ l Be~ fi,> 



for b.B.\>0 0<k<l t>0 



(2.54) 



where h= the shape parameter 
8, A- the scale parameters 



The reliability function is given by 



/*(0-exp{-*lncoshA/ + ( I I)} (2.55) 



Figure 2.4 shows some selective curves (j3 = A=l) for the hazard rate 
function expressed in (2,54). 




Figure 2.4 Hoiard rate function plot. 
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2.5 EXPECTED VALUE AND VARIANCE OF THE RANDOM 
VARIABLES 

The expected value, E(x), of a continuous random variable is defined as 

E{x)-C xf{x)dx (2.56) 

* - ao 

Similarly in the case of a discrete random variable x, the expected value, 
F4 x ). is given by 

k 

E(x)-2xJ(x,) (2,57) 
<-i 

where k are the discrete values of the random variable x. 
The variance a 1 (x) of a random variable .v is defined by 

<,*{x)~E{x 1 )-(E{x)) 2 (2.58) 



2.6 MOMENT-GENERATING FUNCTION 



The moment-generating function, M,{8) is defined for both continuous 
and discrete cases as follows: 
Continuous case 



<*) = / + a> exp(flr)/(0* 



(2.59) 



and discrete case 



M,(0>- 2 « P ( 9t t )fH k )dt 

for obtaining the nth moment about origin we apply the following: 

d"M,(9) 



dQ" 



0=0 



(2.60) 



(2.61) 



Therefore, the expected value and variance are given by these relation- 
ships: 



dM t {B) 



0-0 



(2.62) 



Haitifti Plinung for Incomplete Failure Pain 
and 



„, , d 2 MA9) 
dff 1 



If 



(2.63) 



2.7 HAZARD PLOTI ING FOR INCOMPLETE FAILURE DATA 

This is a graphical data analysis technique (7] to establish failure distribu- 
tions for units with incomplete failure data. Failure data are complete 
failure data if the failure times for all units in a sample are contained. In 
contrast, the failure data are called incomplete failure data if a sample 
contains both the failure times of failed units and running times of 
unfailed units. The unfailed units running times are called censoring times. 

In addition, if in a sample all the unfailed units under observation have 
different censoring times, then the failure data are called multiply censored. 
Furthermore, if the unfailed units in a sample have the same censoring 
time and in addition the censoring time is greater than the failure times, 
then the failure data are called singly censored. This type of data results 
when a sample of items undergo life testing and termination of testing 
before all units fail, whereas the multiply censored data result from any of 
the following: 

1. From the operating units. 

2. Some extraneous causes. 

3. Units removal before failure. 

Some of the advantages of this hazard plotting technique are as follows: 

L It provides a visibility tool because the pictorial plots are easy to grasp. 

2. Data plots are an easy way lo fit a theoretical distribution to data. 

3. It simplifies for the analyst to assess the adequate Fit of a theoretical 
distribution to data. 



17.1 Hazard Rate Platting Theory 

This technique is based on the distribution hazard rate function concept. 
The following three basic relationships associated with the hazard plotting 
technique are defined as 

no .. no nA41 

where z(/)- the hazard rate function 
K</) = the reliability function 

the failure distribution function 
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The cumulative hazard, z r (0, is given by 
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i f (0-=/' zU) dt~ -[lo{1 -Fit))] (2.65) 

The cumulative distribution function r\ i ) is defined as 

?<»)-! -«-V> (2.66) 

The above relationship is very useful to determine hazard function 
properties. 



2.7.2 Hazard Plotting for the Weibull Distribution 

This example is presented for ihe Weibull distribution. However, interested 
readers should consult reference 7 for other distributions as well as for a 
detailed presentation of this approach. Here the theory behind the Weibull 
hazard pinning is briefly described. 

The Weibull hazard, and the density function, fit), arc defined as 
follows: 

= a.fi>0 t>0 (2.67) 

P 

and 

of 0-1 

-*<'>= ^ (2.68) 

Both cumulative distribution and hazard functions are obtained by 
integrating expressions (2.67) and (2.68) over the Ume interval [0, f] as 

follows: 



and 



F{t)-i-e-W 
ln(O = t*"'lnU f ) + m<0) 



(2.69) 
(2.70) 
(2.71) 



By taking the log of (2.70) we get 



The above equation indicated that the left-hand side of this expression is 
the linear function of ln(r r ), which indicates that the log-log graph paper 
is the Weibull hazard paper. Therefore, parameters a and ft can be 
estimated graphically by using the lug-log paper. 



Laplace Transforms 
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The shape parameter, a, is estimated from the fact that 1/a is the slope 
of the straight line. At ; r - I, the value of the /3 is equal to time t. therefore, 
by using this relationship, the value of the scale parameter /J can be 
estimated. 



2.8 LAPLACE TRANSFORMS 

Some of these transforms are used in this book to solve systems of linear 
differential equations with constant coefficients. Furthermore, these trans- 
forms are applied in conjunction with other differential equation tech- 
niques to solve simpler type of partial differential equations. The basic 
definition of the Laplace transform f(s% of a function fit) is as follows: 



(2.72) 



where j— the Laplace transform variable 
I = the time variable 



Example 1. Find the Laplace transform of the function /(r)=f. that is, 



<4 for t >Q 



(2.73) 



Example 2. If /{ f ) = e*'. the Laplace transform of this exponential func- 
tion becomes 

/(J>- /"V'V-'d/- ("%<—'>' 
1 



s — a 



for $ > a 



2.8, 1 Laplace Theorem of Deriv-atires 

11 L'f/C)} -/<*). then 



(274) 



(2.75) 
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2,8.2 Laplace Transform Initial- Value Theorem 
If the following limits exist, then the Abel's theorem is 

lim/(/) = lim sf(s) (2.76) 



2.8.3 Ijtptate Transform Final- Value Theorem 

Provided the following limits exist, then the final- value theorem may be 
staled as: 

lim /(f) = limj/(j) (2.77) 
Laplace Transform Table 

dt 
d 2 f{t) 



dt 



sf(s)-f<0) 
iV(J)-^/(0j-/'(0) 



I - s>0 

s 

e" — s>a 

s-a 



j>0 



v* ■ 1 



1,9 PARTIAL FRACTION TECHNIQUE 

This is used when finding inverse Laplace transforms of a rational function 
such as G(s)/Q{s), where G( * ) and Qi s) are polynomials, and the degree 
of G(s) is less than that of Q{s). Therefore, the ratio of G{s )/Q(s) may 
be written as the sum of rational functions or partial fractions in the 
following forms: 

Z rr» ■ « = 1,2.3.. . 

las + fS)" (as' + Bs+C)" 

Heaviside Theorem. This is used to obtain partial fractions and inverse of 
a rational function, G(j )/£(*). 

The inverse of Gfs)/Q(s) may he written as: 



Differential Equations jj 

where the prime represents derivative with respect to i, 0, represents Uh 
zero and k denotes total number of distinct zeros of Q{s). 

Example 3. Suppose 

G(s) s + 2 



Qis) (j-4){j-6) 

find the inverse Laplace transform. Hence, 

G(j)-n+2 Q(s)-3*~ lQs+24 Q'(s)-2s- 10 
/3, = 4 B 2 ~6 k-2 

Therefore, 

G(4) 4 G(6) < .t'- 4f «'-3 g ^ n 7 q t 



2.10 DIFFERENTIAL EQUATIONS 

The single-indepcndent-variable linear first-order differential equations in 
the reliability study are mainly associated with the Markov technique. In 
this section we discuss how to solve such equations using integration 
techniques. 

The first-order first-degree linear differential equation may be written in 
the following form 

dP 

~+PG(t)-Q(t) (2,80) 



Since 



f + «?(*> 



(2.81) 



The above expression shows that * -f c """ is an integrating factor of the 
differentia) equation (2.80). 
The primitive of differential equation {2 80) may be written as 



where c is a constant. 
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Example 4. Obtain the solution equation for the following differentia! 
equation: 

^+6/>-8 (2.83) 

m 

Hence, 

f f<m# m eJ** mi ** (2.84) 
By substituting (2.84) into (2,82), we gel 



P=l-i-ce- 6 ' (2.85) 



t 

For given initial conditions; at f«0. P-l; the following value for the 
constant c is obtained from (2.85). 

' =-3 

(2.86) 



2,10.1 Differential Equation 

Solution with Lap/ace Transform Technique. Solving the same differential 
equation 

with the Laplace transform method for same initial conditions we gel 

jP- 1 + 6P- - 

s 



The inverse Laplace transform of the above equation is 

This shows thai solution (2.86) is same as the solution to (2.89). 
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2 10. 2 System of Linear First Order Differential Equations 

The following system of linear first-order differential equations with con- 
stant coefficients are associated with the transition diagram* of Figure 2,5. 
This transition diagram represents a threc-staic device, for example, a fluid 
flow valve, electronic diode, an electrical switch, etc. 



dt 



+ >/*□(/ ) = 0 



Ji 
ds 



-\ l P a ir)=0 
K 1 P o ir) = 0 



At /— 0. P 0 (t)* 1, and other probabilities are zero. 
The Laplace transforms of differential equations <2.90)-(2.92) are 





+ a, m j) 


OP^s) 




OP^s)- P a (0) 






sP t {3) 




OJ> 2 (j>=J>,(0) 




-XjP 0 (s) 


0P,(s) 










' j+A, + Aj 


0 


0 








1 






s 


0 








1) 






0 


I 




/ 3 (*> 




0 



(2.90) 
(2.91) 
(2.92) 

(293) 
(2.94) 
(2.95) 



"Markov technique i» discussed id Chapter 3. 
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By solving the above we get 

*">- ,<,♦*,'+»,) (2 97 > 



The inverse Laplace transforms of (2.96)-(2.98) are 

iV)-* - '*'** 1 " ( 2 "> 

> *\ (1 -»-'*'»*'"} (2.101) 
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Fundamental Concepts 
in Reliability Engineering 



3.1 INTRODUCTION 

This chapter hriefly presents the fundamcnlal concepts in reliability en- 
gineering such us general reliability function, redundant networks, reliabil- 
ity evaluation techniques, reliability apportionment, and failure mode and 
effect analysis. 

In this chapter, a brief discussion on reliability evaluation techniques 
such as binomial. Markov processes (slate space approach), decomposi- 
tion, minimal cul set, and network reduction is presented. The delta-star 
technique is presented in a more detailed form. 



3.2 GENERAL RELIABILITY FUNCTION 

3. 2.1 General Concepts 

Suppose n 0 identical components arc under test, after time t, n f {t) fail and 
n,<0 survive. The reliability lunclion R(t) is defined by 




(3.1) 



since 



V 

»«(0+*/(0*»*fl 



the equation becomes 



(3-2) 



since 
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/*(/) + F(t ) = I 



«(')=l-f(0 



(3.3) 



where F{t) ihe failure probability at time t. To obtain failure probabili ty 
FXl\ substitute (3.2) into (3,3), and subtract it from unity, that is. 



SIHCk' 



■I " 

n,(t) + n / {t)-n 0 

i 



By using the above resull in relationship we get 

Mi) 

/? ( / ) - I — O = 1 — - 



(34) 



(3.5) 



ihc derivative of (3.5) with respect to lime / is 

dR{t) 1 dtij(t) 



dt 



n 0 dt 



(3.6) 



In the limiting case, as dt approaches zero, the expression {3.6} is the 
instantaneous failure density function /(/), that is. 



1 dH f it) 



"■• dt 



V(0 



Therefore, expression (3.6) becomes 

dRjt I 
dt 



-fit) 



(3.7) 



By using relationship (3.2) the other form of (3.6) may be written as 

dn,{t) dRit) dn,it) 

~ -*o— 



dt 



di 



dt 



as) 
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3.2,2 instantaneous Failure or Hazard Rate / 

If we divide both sides of (3.8) by rt,{i), we get 

«,(/) di n,{t) dt ' 
This is equal to the hazard rale X{r). that is, 

I dnAt) „ dR{t\ . . 

By substituting (3.2) and (3.7) into (3.10), we get an expression for the 
hazard rale (instantaneous rate): 

i mn _ /(/) nil . 
Mt) RU)~dt — Wn a,,) 



3,2,3 Reliability Function 
Equation 3.11 may be rewritten in the following form: 

By integrating boih sides of (3,12) over (he time range 0 to I, we gel 

jp">*--riB*w 

For the known initial condition that at /—0, /t(r)=l the above integral 
expression becomes 

In - - (\{t)dt (3.13) 
The following general reliability function is obtained from (3.13): 

*</)-e-jC* ,, »'* (3.14) 

Where \(/) is the time-dependent failure rate or instantaneous failure 
rate. It is also called the hazard rule. The above expression is a general 
reliability function. In other words, it can be used to obtain a component 
reliability for any known failure time distribution. 
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33 BATHTUB HAZARD RATE CURVE 

This hazard rate curve, shown in Figure 3.1. is regarded as a typical hazard 
rale curve, especially when representing the failure behavior of electronic 
components. Mechanical components may or may not follow this type of 
failure pattern. -, 
As shown in f igure 3.1 the decreasing hazard rate is sometimes called^ 
the "burn-in period." There are also several other names for this period 
such as debugging period, infant mortality period, break -in period. Occur- 
rence of failures during this period is normally attributed to design or 
manufacturing defects. 

^ The constant part or this bathtub hazard rate is called the "useful 1 
period," which begins just after the infant mortality period and ends jusy 
before the "wear-out period." 

The wear-out period begins when an equipment or component has aged 
or bypassed its useful operating life. Consequently, the number of failures 
during this time begin to increase.^Fai lures that occur during the useful life 
are called "random failures" because they occur randomh *>r in another 
word unpredictably.^ 

The hazard rale shown in Figure 3,1 can be represented by the following 
function [15]: 

A( t) - k hci e ~ 1 + ( I - k )b< *" W (3.15) 
for A.c.p.AX) 0<*<1 />0 and r-0.5 t>=l 

where b. c — shape parameters 
fi, A = scale parameters 
/ " time 
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3.4 MEAN-TIME-TO-FAILURE (MTTF) 

The expected value E(t), in our case MTTF, of a probability density 
function of ihc continuous random variable time / is given by 



£( / ) - MTTF - /*"//< t)dt (3.16) 
Jo 



where /(f) is the failure density function. 

Fxuinph' I Suppose, a component failure lime follows the exponential 
failure law. It follows that the component has constant failure rale, A (i.e., 
useful life period of the bathtub curve). Find the reliability function and 
the mean -lime- to- Failure expressions. 
From the known information. 

/(f ) = *<•"*' (3.17) 

and 

A{f) = A (3.18) 
To obtain the reliability function substitute (3. 1 8) into (3.19): 

R(f)~e --C**=<r A ' (3.19) 
In the case of MTIT" substitute (3.17) into (3.16) to get 

MTTF= /""rAf-*' (3.20) 
The following is obtained by integrating the above expression by parts: 
MTTF= [ -<*"*']" 



.".MTTF - i (3.21) 

The above expression represents the situation when a component's failure 
limes are exponentially distributed. MTrF is a reciprocal of the constant 
hazard rale, A. as given by (3.21). 



3-5 RELIABILITY NL WORKS 



This section describes ihc five typical reliability configurations. 
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Figure 3.1 A icnn &yitem block diagram 



3,5.1 Series Structure 



] 



This arrangement represents a system whose subsystems or components 
form a series network. If anyone of the subsystem or component fails, the 
series system experiences an overall system failure. A typical scries system 
configuration is shown in Figure 3.2. 

If the series system component failures are statistically independent, 
then the reliability /?, of a series system with nonidentical components is 
given by 



fu, 

I- 1 



0.22) 



where n is the number of components <>r subsystems and R t is the 
reliability of i th component or subsystem. 

If the failure limes of components are exponentially distributed (ix., if 
components have constant failure rates), then the ilh component reliability 
may be obtained from (3.19). that is. 



By substituting (3.23) into (3.22), 

n 



1 3.23) 



(3.24) 



M i l l is given by 



MTTF- f e'2V** 



(3.25) 



2*, 



The above expression shows that a series system (MTTF) is the recipro- 
cal of sum of the sencs network component failure rates. 
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Example 2. Two nonidentical pumps are required to run a system at a 
full load. Assume, pump I and 11 have constant failure rates A, -0.0001 
failure/hour and A 2 -0.0002 failure/hour, respectively. Calculate this series 
system mean time to failure and reliability for a 100 hour mission: assume 
that both the pumps start operating at / = 0. 

The following series system reliability R t for a 100 hour mission is 
computed by using (3.24): 

« ) (|00)-(.< 0000,+OJMo2 x ioo > = 0.97045 
By utilizing (3.25) we get 

MTTF= vb^ " o-oooi +0.0002 hours 



£ 3.5.2 Parallel Configuration]^ 



This configuration is shown in Figure 3.3. This system will fail if and only 
if all the units in the system malfunction. The model is based on the 
assumption thai all the system units are active and load sharing. In 
addition it is assumed that the component failures are statistically indepen- 
dent. A parallel structure reliability R p with nonidentical units or compo- 
nent reliability is given by 

n 

n O-KJ (3.26) 

















ft, 
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Figure A3 



A parallel network 



Muck diagram 
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where n is the number of units. R, is the reliability of i th component or 
subsystem. 

If the component failure rates are constant, then by substituting (3.19) 
into (3.26), 

IT 

*,(0-l- II (I-*"*-') (3.27) 
j ■ i 

MTTF is obtained by integrating (3.27) over the interval [0. ooj, 
MTTF = fJ*R p (t)dt = J"* J J - ft (l-f"*'')Jrfr 

J ! + ! . + ...) 

\X,+X 2 +X 3 \ l +X 1 +X 4 ) 

+(-l)*+'-J- (32B) 
f-l 

For identical components, the above equation reduces to 

MTTF-ji-lr (3.29) 
A i-i 1 

Example 3. Suppose two identical motors are operating in a redundant 
configuration. If either of the motors fails, the remaining motor can still 
operate at the full system load. Assume both motors are identical and their 
failure rates are constant. In addition, motor failures are statistically 
independent. If both motors start operating at /-0, find the following: 

1, System reliability for given A*=0.0005 failure/ hour, r-400 hours (mis- 
sion time). 

2. Mean- timc-io-fai lure (MTTF). 

For identical units, (3.27) becomes 

*(/)-2e-*'-<r-"' 

since X -0,0005 failure /hour, t = 400 hours 

R (400) — 2 e " 0005 M * n> — e ~ (I * CaK,s ^* 00 ^ 
-0.9671 
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MTTF is obtained from (3.29) 



MTTF=^fl + l)=H 
X\ 2f 2\ 



1.5 



0.0005 



= 3,000 hours 



[ 3.5.3 Standby Redundancy J <J dflUflrfU't Oft fatr& ? 

This type of redundancy represents a situation with one operating and n 
units as standbys. The standby redundancy arrangement is shown in 
Figure 3.4. Unlike a parallel network where all units in the configuration 
are active, the standby units are not active. 

The system reliability of the (n+ I) unit, in which one unit is operating 
and n units on the standby mission until the operating unit fails, is given 
by 



MO- 2 



(XD'e 



4-0 



(3.30) 



The above equation is true if the following are true: 

1. The switching arrangement is perfect. 

2. The units are identical. 

3. The units failure rates are constant. 

4. The standby units are as good as new. 

5. The unit failures are statistically independent. 

In the case of (n+1), nonidentical units whose failure time density 
functions arc different, the standby redundant system failure density is 




Figure 3.4 A standby redundancy model. 
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given by 
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J y.-Oy. , J -o J y ,-0 

eh\ay 2 dy H (3.31) 

Thus, the system reliability can be obtained by integrating f } ,{') over the 
interval [f, oo] as follows: 

«.,(')- f".UO<* (3.32) 

Example 4. Assume that a system contains two identical units, that is, 
one operating and the other on the standby mission. Furthermore, the 
units failure rales are constant. In addition assume that the standby unit is 
as good as new at the beginning of its mission. Evaluate system reliability 
for a 100-hour mission for given unit failure rate, A -0.001 failure/hour. 
Equation (3.30) gives us 

K„<n = (l+AOe-*' (3.33) 
For known / = 100 hours. \ -0,001 failure/hour, the system reliability is 

-(I +0.1)*-°' = 0.9953 

1 3,5,4 k-out of-a Configuration ^ 

This is another form of redundancy. It is used where a specified number of 
units must be good for the system success. The series and parallel config- 
uration in the preceding sections are special cases of this configuration, 
that is, i — n and £— I, respectively. 

Reliability of this type of configuration is obtained by applying the 
binomial distribution. The system reliability for fc-oul-of-n number of 
independent and identical units is given by: 

*A„.-2 0)A'0-*)- J (3.34) 
i-t 

For the constant unit failure rate \. the above equation becomes 

*>,.U)-i{1)i*- k 'Y ll ~*~" r 0.35) 

(I-*"*')' 
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Flflirr IS A five dissimilar unilt bridge network. 

Example 5. Assume that in (3.35) k -2, ji-3, and \ — 0,000 1 failure/hour. 
Therefore the system reliability for a 200-hour mission is 

*2/j(200)-3#~ <3><a0ol)ai<> -2e" (3><0aol,2DO 

= 0.9133 

3.5.5 Bridge Configuration 

This network is shown in Figure 3.5. The critical element of the configura- 
tion is labeled as "3." For nonidenttcal and independent units, the five 
units bridge network reliability equation from reference 27 is 

R i> -2R ] R 2 R i R 4 R,-R 2 R J R 4 R f -RiR 3 R 4 R s -R t R 1 R i R, 

-R l R 1 R i R i -R i R,R :i R 4 + R l R J R s 4'RjR :i R 4 + R l R t + R 1 R s 

(3.36) 

In this case of identical units, the above equation reduces to 

«(, = 2Jl s -5rt* + 2it*+2/r a (3.37) 
For units with constant failure rate, substitute (3.19) into (3.37), that is, 

R( t ) - 2e- JA ' - 5e* u +2e- M '+ 2e~"' (3,38) 
MTTF is obtained by integrating (3.38) over the interval [0. oo], that is, 

MTTF - f *(2 e ~ 5 *' - 5* ~ **' + 2 e ~ iX ' + 2* ~ ) <// 

Ml 
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Example 6. Compute system reliability and MTTF for a five independent 
and identical units bridge configuration. Suppose 

X =0.0005 failure/hour 
/ = 100 hours 

and all units starl operating at f ™0. Equation 3.39 gives 
= 0.9999 

The following MTTF result is obtained by substituting the given Failure 
data in (3.39) 

49 

MTTF = = 1633,4 hours 

60x0,0005 

3.6 REMABIIJTY EVALUATION TECHNIQUES 

This section briefly describes reliability evaluation techniques Readers with 
no indepth knowledge of these techniques should consult references 36 and 

37. 

3.6,1 Binomial Theorem to Li'-aluttte AtonwM Reliability 

This is one of the simplest methods to evaluate system reliability. However, 
it is only useful for evaluating reliability of simple systems of series and 
parallel form. For complex systems it is quite a trying task. 

The following is always true for the binomial expression in reliability 
engineering: 

(340) 

where P-lhc component probability of success 
i?- the component probability of failure- 
It* the number of identical components 

Example 7. When two identical components form a scries or parallel 
configuration we obtain the resulting equation from (3.40): 

(p + qf^ + lpq + q 1 (3.41) 

Here p 2 = probability of both components operating 

2 pq = probability of one component failed and one working 
q 1 = probability of both components failed 
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Therefore, the reliability equation of a two unit parallel system is given by 
the first two terms of the right-hand side of expression (3.41): 

R-p*+2pq (3.42) 
since p + q^l, then p—\~q (3.43) 

By substituting (3.43) into (3.42) we get 

R=\-q 2 (3.44) 

This is the reliability equation for a two identical and independent unit 
parallel system. 

3.6.2 State Space Approach (Markov Processes) 

The slate space approach is a very general approach and can generally 
handle more cases than any other method. It can be used when the 
components are independent as well as for systems involving dependent 
failure and repair modes. There is no conceptual difficulty in incorporating 
multistatc components and modeling common cause failures. 

The method proceeds by the enumeration of system states. The state 
probabilities arc then calculated and the steady-state reliability measures 
can be calculated using the frequency balancing approach [39]. The perti- 
nent relationships are given below: 

1. Unavailability or the probability of failure is given by 

P<= 2 *i (345) 

iC F 

where probability of being in state / 
F- subset of Failure states 

2. Frequency oF failure [39] of encountering subset F is given by 

fr 2 (3-46) 

teS-F JGF 

where 5 = system state space 

A, ; ™ transition rale from state / to state j 

3. Mean duration of failure state is [39] 

P 



*r t n-47) 

ft 



When the components are independent, the stale probabilities can be 
obtained using the multiplication rule. When, however, dependent failure 



II 
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or repair modes are involved, the state probabilities have to be obtained by 
solving a set of linear algebraic equations. Probably the only serious 
problem, particularly when constant transition rales are assumed, with this 
approach is that For large systems, it could become unmanageable. In 
many situations, the problem can be handled using a computer-generated 
state transition matrix and reducing the size of the state space by trunca- 
tion, sequential truncation, and using the concept of state merging. These 
techniques are discussed in reference 39, Examples of the application of 
these techniques for large systems can be found in reference 40. 

3.6.3 Netuvrk Redaction Technique 

It is a simple and useful procedure for systems consisting of series and 
parallel subsystems. Configurations such as bridge networks can be 
analyzed using delta-star conversions [12, 13, 41], Some approximation, 
however, is involved in the use of these techniques [41]. The technique 
consists of sequentially reducing the parallel and series configurations to 
equivalent units until the whole network reduces to a single unit. The 
bridge configurations can be converted to series and parallel equivalents 
by using delta-star conversions or decomposition approach. 

The primary advantage of this method is that il is easy to understand 
and apply; however, generally it is not suitable for considering degraded 
failure modes of components and systems. The independence of compo- 
nents has to be generally assumed, 

3.6 J Dvcomposititm Method 

This method decomposes a complex system into simpler subsystems by the 
application of conditional probability and conditional frequency theorems 
[37], The reliability measures of simpler subsystems are calculated and then 
combined to obtain the results for the system. This method can be used to 
simplify both the stale space, as well as the network approach. Examples of 
application in both of these areas can be found in reference 37, The 
success of the method depends upon the choice of the key component, that 
is. the component used for decomposing the network. If this component is 
not judiciously chosen, ihe final results will be the same, but the computa- 
tions could be far more tedious. For a relatively complex network, the 
choice of proper key components to decompose the system into series 
parallel configurations can be a trying task. 

3.6.5 Minimal Cut Set Method 

A general approach to the solution of reliability block diagrams is based 
on minimal cut sets or minimal tie sets. This approach is very suitable for 
computer application. The minimal cut sets of a reliability block diagram 
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can be identified using special algorithms. Once the minimal cut sets have 
been enumerated, the reliability measures can be calculated using the 
following relationships: 



/y=/>(c,) + - +/-(c„)- 



and 



(-i)"- [^rvnC,)] 



(3.48) 



...(-ir^tc.n-'nc,,)/:^..^) (3.49) 

where C ( ,C ( — the minimal cut set i and failure of components in C t , 
respectively 
;i, = the repair rate of component i 
|I, +y+A -the sum of p f over all y'e^u C,uC t ), that is the sum of 
repair rates of the components which belong to any or all 

of c t ,q,c k 

/y— probability of failure 
J f — frequency of failure 

As in ihe case of all network methods, [his technique is not suitable for 
incorporating degraded modes of operation. For m minimal cut sets, the 
number of terms to be evaluated is 2". This could create computational 
problems, which could be partly alleviated using ihe concept of probability 
and frequency bounds [42 1 Similarly, one may obtain lie sets for a 
complex system (described in detail in reference 37), 



6.6 Delta-Star Technique [ </- 



J 



To analyze a complex structure such as bridge, the delta-star transforma- 
tion [12, 13] easily transforms the configuration inlo series and parallel 
combinations. We derive the delta-star equivalent formulas by obtaining 
the equivalent legs of the block diagrams of Figure 3.6. 

Consider, for example, three components of a system with reliabilities 
1. <f , Rjf,, and R CB connected to form the delta configuration shown in 
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Figure 3.6 Dcllt-ttar reliability equivalent. 



Figure 3.6. This configuration yields the star equivalent with reliabilities 
R A , R a and R c , 

Now consider the transformation steps indicated by Figure 3.7 to derive 
the delta to star equivalent. The application of independent event probabil- 
ity laws to components connected in series and parallel combinations as 
shown in Figure 3.7(a-c) will yield (2.55% (3.56), and (3.57), respectively: 

For a simple independent series, the total system reliability is given by 

where R A , R B — R N are the reliabilities of the N components. 

The simple independent parallel case yields the total system failure 
probability as 

!#^>4*J F N = (\-R A )(\-R B )-(\-R s )j (3.51) 

where F A , F B - - • F s are the unreliabilities of the N components. 

Applying (3.50) and (3.51) to the legs presented in Figure 3.7 their 
corresponding relationships are 



R a R^\-{\-R ac )(\-R cb Ra») 
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Figure 3.7 LMu-sLar equivalent Leg;.. 



From these three simultaneous equations the following delta-to-star 
relationships result: 



\l (1 


-R AC )(\- 




-<1-*caM' 








-(I-«^)<I- 




(3.55) 
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(3.56) 


'[ 1 - i ! 


-A„r)C- 




[1 








[l 









(3.57) 
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Figure 3JI A two-state device bridge structure. 



Example 8. A bridge network example for independent units is solved to 
illustrate the use of these formulas. Figure 3.8 illustrates the structure for a 
simple bridge; the letters A, 3, and C are used to label elements of the 
delta configuration. 

We obtain the equivalent star configuration values by using (3.55) 
(3.56), and (3.57). 

.\R 4 - 0.9948 

R„- 0.9930, R c = 0.9954 

The network shown in Figure 3.8 may be expressed as its equivalent as 
shown in Figure 3.9. The reliability equation for this structure is 

^[l-n-K./^Hl-/^/?,)]**- 



H h 



«, = 0.93 



0 996-1 



* a - 0.8930 « 7 - 0.9 

Figure 3.9 A IransNirnidl (wit-sUilr device ilruclurr 



Reliability Apportionment 

Numerically the value of the total bridge reliability is 

R T =* 0.987 

for the given component reliability values. 
Analyzing this bridge structure with the event space method also yields 

K r «0.987 

Equations 3.55-3.57 are interrelated. Therefore computation of the 
value of the first equation helps to compute values of the other two 
equations. This minimizes the computing time. 



3.7 FAILURE. MODE AND EFFECT ANALYSIS (FMEA) 

This is an important step in a reliability and maintainability assurance 
program. FMEA is a tool to evaluate design at the initial stage from the 
reliability aspect. This criteria helps to identify need for and the effects or 
design change. 

Furthermore, the procedure demands listing of potential failure modes 
of each and every component on paper and its effects on the listed 
subsystems. 

FMEA becomes failure modes, effects, and criticality analysis (FMECA) 
if criiicalitie5 or priorities are assigned to failure mode effects. 
Some of the main characteristics of this procedure are as follows: 

1. This is a routine upward procedure that begins from the detailed level, 

2. By evaluating failure effects of each component, the whole system is 
screened completely. 

3. It improves communication among design interface personnel. 

4. It identifies weak areas in a system design and indicates areas where 
further or detailed analysis are desirable. 

Some of the main steps to perform FMEA are shown in Figure 3.10. 
3^ RELIABILITY APPORTIONMENT 

To achieve the required reliability of a complex system, it is a routine 
procedure to set reliability targets for subsystems. Its main advantage is 
that once the individual subsystem reliability goal is achieved, then the 
overall system goal will automatically be fulfilled. 

The process to set such reliability goals is known as the reliability 
apportionment. Normally this is accomplished before the key design or 
development decisions are made. 
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and detailed requirenmnii 



I 



LilT ill compootm* and 
subsystem* m g jyHem 



List necessary failure modes, the 
rjescnpiton and the identification 
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Assign failure rites to each 
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each failure mode 
in question 



Review each critical 
lei lure mode and take 
necessary action 



FlfWC 3.10 FMEA flow chart. 



Some of the reliability apportion men I techniques are described in the 
following sections. 

J. 8. / Reliability / Cost Models 

Before applying this reliability /cost procedure, the relationship between 
reliability and cost must be known for each subsystem to meet system 
reliability goal at minimum cost. However the main drawback of this 
procedure is the lack of availability or cost data, that is. cost at a given 
level of reliability, 

J.g.2 Similar Familiar Systems Reliability Apportionment Approach 

This approach is based upon the Familiarity of the designer with similar 
systems or subsystems. Its main weakness stems from the fact that the 
reliability and life cycle cost of earlier similar designs have to be assumed 
adequate when designing new systems. By applying this technique [he 
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failure data collected on similar systems from the various sources can be 
utilized. 

Factors of Influence Method 

This procedure is purely based upon the following important factors that 
effect the system in question: 

1. Complexity /Time. In the case of complexity it relates to the number of 
subsystem parts, whereas lime is related to the relative operational time 
during the total functional period. 

2. Environmental Factor. This concerns each subsystem's operating en- 
vironmental conditions such as temperature, vibration, humidity. In 
other words it deals with susceptibility or exposure of subsystems to 
such environmental conditions. 

3, State-of-tke-Art. This factor takes care of advancement in the state-of- 
the-art for a particular subsystem or component. 

4, Subsystem Failure Criticality. This factor includes the criticalily effect 
of a subsystem failure on the system. For example, the failure of some 
auxiliary instruments in an aircraft may not be as critical as the failure 
Qf engine. 

When applying this factor of influence procedure, each and every 
subsystem is rated with respect to the influential factors, and one can 
assign a number between I and 10, where I is allocated to a subsystem 
least affected by the factor in question and 10 is allocated to a subsystem 
most affected by the factors of influence. Thus reliability can be allocated 
by using the weight of these assigned numbers for all factors. 

3,8.4 Combined Familiar Systems and Factor of Influence Method 

Both the familiar systems and factors of influence methods have their 
weakness when they are used individually. However, combining the two 
methods produces better results because data arc used from the similar 
subsystems as well as when new subsystems are designed under different 
factors of influence. 
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Fault Trees and 
Common Cause Failures 

4.1 INTRODUCTION 

This chapter discusses fault trees, which are lo analyze complex systems. 
Phis technique has rapidly gained favor because of its versatility in degree 
of detail of complex systems. The fault tree technique was originated by 
H. A. Watson of Bell lelephone Laboratories to analyze the Minuteman 
Launch Control System. It was further refined by a study team at the Bell 
Telephone Laboratories. 

Further work on fault tree techniques was carried out at the Boeing 
company in which Haas! [37] played an instrumental role. A turning point 
took place in l 4 >65 when several papers on the technique were presented at 
the 1965 Safety Symposium held at the University of Washington, Seattle, 
|37J. Ever since several experts have made further advances in this tech- 
nique. 

Again another symposium on the technique was organized at the Uni- 
versity of California at Berkeley [2]. A comprehensive bibliography on the 
technique is presented in reference 21. 

Most of the material presented in this chapter is taken from the listed 
fault tree bibliography al the end of this chapter. The second part of this 
chapter deals with the subject of common-cause failures. 

4.2 FAULT TREE SYMBOLS AND DEFINITIONS 

This section presents most of commonly used fault tree symbols and 
definitions. For more comprehensive symbols and definitions one should 
consult references 65 and 1 24. 

AND Gale. The AND gale denotes lhat an output event occurs if and 
only if all the input events occur. 




rj,;!i',iil 



Inpull 
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OR Gate. The OR gate denotes that an output event occurs if any one or 
more of the input events occur. 




Outpui 



Input! 



Exclusive OR Gate. I he output of this gate is an intermediate event. This 
gate denotes lhat there is no output unless one and only one of the mpui 
events occurs. 




Outpul 



Input] 



fSftority AND Gale. It is logically equivalent lo an AND gate with ihc 
exception thai the input events must occur in a specific order. It is 
represented b_\ the following symbol: 




Ckiipul 



Inputs 



Inhibit Gate. This gate produces output only when the conditional input 
is satisfied. The inhibit gate is logically equivalent to an "AND" gate with 
two input evenls. 



OutpUt !*.!! 

(Etfectl 




iiuaur ton 
(Cjuiil 
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Special Gate. This gate represents any olher legitimate combination of 
the input events. 



Output 




Inputs 



Delay Gate. This represents a gate whose output only occurs after a 
specified delay lime has elapsed. 



Outtmi 
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The Triangle. A triangle denotes a transfer IN or OUT, It is used to avoid 
repeating sections of the fault tree. A line from the top of the triangle 
indicates "transfer in." A line from the side of the triangle denotes 
"transfer out." 



Transfer 
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Resultant Event . A rectangle denotes an event which results from the 
combination of fault events through the input of a logic gate. 



Basic Fault Event. A circle represents a basic fault event or the failure of 
an elementary component. The failure parameters such as unavailability, 
probability, failure, and repair rates of a fault event are obtained from the 
empirical date or other sources. 



Incomplete Event. A diamond represents a fault event whose causes have 
not been fully developed. This event could be further developed to show 
basic contributory failures; however, it is noi developed either due to lack 
of information or due to lack of interest. 




Trigger Event. The house shape symbol denotes a fault event which is 
expected to occur. 



Traniler "iki1" 
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The Conditional Event. This is denoted by an ellipse, ["his symbol indi- 
cates any condition or restriction that applies to a logic gate. 




Double Diamond Ft cm This symbol represents an undeveloped fault 
event that requires further development to accomplish the fault tree. 




The Upside Down Triangle. This symbol denotes a similarity transfer, that 
is, the input is similar but not identical to the like identified input 




4 3 GENERAL PROCEDURE TO ANALYZE FAULT TREES 

To develop fault trees, the following basic steps arc generally required: 

1. Define [he undesired event (lop event) of the system under considera- 
tion. 

2. Thoroughly understand the system and its intended use. 

3. To obtain the predefined system fault condition cause, determine the 
higher order functional events. In addition, continue the fault event 
analysis to determine the logical interrelationship of lower level events 
that can cause them. 

4. After accomplishing steps 1 3 construct a fault tree of logical relation- 
ships among input fault events. These arc to be defined in terms of 
basic, identifiable, and independent faults. 

To obtain quantitative results for the top event (undesired event) assign 
failure probability, unavailability, failure, and repair rates data to basic 
events provided the fault tree events are redundancy free. 
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A more rigorous and systematic approach requires the following steps: 



1. System definition, 

2. Fault tree construction. 

3. Qualitative evaluation. 

4. Quantitative evaluation. 

The above steps are outlined in detail in the following sections: 

4.3.1 System Definition 

To establish the system definition in fault tree analysis is a very difficult 
task.. A system is normally represented by a Functional layout diagram 
showing all functional interconnections and components of the system in 
question. To draw a fault tree of a system, it is strongly recommended that 
the system boundary conditions be established. However, care must be 
taken so that these boundary conditions are not confused with the physical 
bounds of the system. 

One of the most important boundary requirements is the top event 
(undesired event). Therefore, care must be taken to define the system top 
event for which the fault tree is to be drawn, because this is a major system 
failure. In addition to make the fault tree analysis understandable to 
others, the analyst must list all the assumptions on system definition and 
fault tree. 

4,3.1 Fault Tree Construction 

The major objective or fault tree construction is to represent system 
conditions symbolically, which may cause the system to fail. Furthermore, 
the fault tree construction can pinpoint the system weaknesses in a visible 
form. This acts as a visual tool in communicating and supporting decisions 
based on the analysis and to perform trade off studies or determine the 
adequacy of the system design. 

Generally the analyst is expected to understand the system thoroughly 
before he proceeds to construct a system fault tree. To enhance the fault 
tree analysis, a system description should be part of the analysis documen- 
tation. 

There are three generally accepted approaches to construct fault trees: 

1. Primary failure technique. 

2. Secondary failure technique. 

3. Commanded failure technique. 

The above techniques are used at the discretion of the reliability analyst 
according to the main requirements of the failure fault tree analysis. 
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Primary Failure Fault Tree Construction. The failure of a component is 
called primary failure if it occurs while the part is functioning within the 
operating parameters for which it was designed. To construct a fault tree 
by only using primary failures is a straightforward process, because a fault 
tree ls only developed to the point where identifiable primary component 
failures will produce fault events. The following example is presented to 
illustrate this technique. 

Example I. Construct a fault tree of a simple system concerning a room 
containing a switch and a light bulb. Assume the switch only fads lo close. 
In addition, the top event is the dark room. 

The system fault tree is shown in Figure 4, 1. The basic or primary events 
of the fault tree are as follows: 

1. Power failure, £",. 

2. Fuse failure. E 2 . 

3. Switch fails to close, £ } . 

4. Bulb burnt out, E 4 . 

The intermediate event is the "power ofr." The failure event of main 
concern is the top event, labeled "dark room." Therefore, the major 
emphasis of this analysis is toward the darkness in the room. The fault tree 
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in Figure 4,1 shows that the input events arc gated through OR gates At 
the occurrence of any one or the basic Tour events £ t , E 2 , £j, E the 
system top event ("*dark room") will occur. 

Secondary Failure Fault Tree Construction. To include secondary failures 
in fault tree analysis requires a greater insight into the system. The fault 
tree analysis is carried out beyond the basic component failure level. The 
secondary failures are due to excessive environmental or operational stress 
placed on the system components. 

Example 2. A simple fault tree with the lop event "motor fails to deliver 
power" is shown in Figure 4,2. The fault tree shows the pnmarv events 
mich as switch fails to close, internal motor circuitry failure, power failure. 



MotDf tails 
lo deliver 
power 




Power fails Plug blows 




rhpar 42 A hull tr« with secondary failum. 
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and fuse failure. Secondary failures are represented in [he rectangle as an 
inlermediate event. The secondary failures shown in Figure 4.2 occur due 
to inadequate maintenance, hostile external environment, external 
catastrophe, and so on. These failures are discussed later in this chapter. 

Fault Tree Construction mth Command Failures. These failures result from 
proper component operation at the wrong time or place. Command failures 
are failures of the coordinating events between various levels of the fault 
tree from bas.c failure events to the top event (undesired event or final 
event). A typ,cal example of command f ai |ure is an erroneous electrical 
signal to an electrical device (e.g., a motor, a transducer). Figure 4.3 shows 
the interrelationship among basic and commanded failures In Figure 4 3 
the basfc failure is represent by a circle, whereas the rectangle represent 
a commanded fault. 



4.3.3 Quatitaihv Fault Tree Evaluation 

This approach uses minimal cut sets of a fault tree. A cut set is defined as 
a set of basic events whose occurrence results in an undesired event, 
r-urthermorc, if a cut set cannot be reduced but insures the occurrence of 
the undesired event, the set is a minimal cut set. Obtaining minimal cut 
sets is a tedious process, since a computerized algorithm is required to 
obtain minimal cut sets. A qualitative evaluation example is presented in 
Figure 4.4. 

As shown in Figure 4.4, the intermediate fault event B can only occur if 
both events E : and E 2 occur. In the case of intermediate event C, it can 
only occur if either event E 3 or event E, is present. The top event results if 
either one of the intermediate event florC occurs. 
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4.3,4 Quantitative Fault Tree Evaluation 

This evaluation uses top event quantitative reliability information, such as 
failure probability, failure rate, or repair rate. Component failure parame- 
ters are evaluated first, then critical path, and finally the top event- 
There are two accepted methods to determine quaniilativc fault tree 
results; 

1, The Monte Carlo simulation method. 

2. The analytical solution approach. 

In the case of the Monte Carlo simulation, the fault tree is simulated using 
a digital computer to obtain quantitative results. Generally, the fault tree 
failures are simulated over thousands or millions of trial years of perfor- 
mance. Some of [he main sicps required to simulate a fault tree on a digital 
computer .ire ;is follows: 

1. Assign failure data to the basic events. 

2. Represent the entire fault tree on a digital computer. 

3. List failures that lead to occurrence of the top event and the associated 
minimal cut sets. 

4. Compute the desired end results. 
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In ihc case of the direct analytical solution method, it makes use of the 
existing analytical techniques. These techniques are described in the forth- 
coming sections. 

4.4 ANALYTICAL DEVELOPMENTS OF BASIC GATES 

Fault trees are constructed to show system components pic tori ally and 
logically, A fault tree is constructed by using AND. OR, and other gates 
that relate logically various basic component faults to the lop event. To 
represent these logic diagrams in a mathematical form, the Boolean alge- 
bra is an invaluable tool. The mathematical expressions for OR, AND, and 
Priority AND gates are developed in the following sections. 

4.4.1 OR Gale 

The OR gate is represented by the symbols U or +. Any one of these 
symbols denotes the union of events associated with an OR gate. A 
mathematical representation of two inputs OR gate is shown in Figure 4.5. 
The output event B 0 of an OR gate in Boolean algebra is written as 

(4.i) 

where and B 2 are the input events. 

4.4.2 AND date 

In Boolean algebra the AND situation is represented by the symbol ■ or 
n . This symbol represents intersection of events. The iwo-inpul AND gate 
is shown in Figure 4,6. The output event, fl 0 , of the AND gate in Boolean 
algebra is represented by (4.2): 

*o-*r*i (4.2) 




Hgur* <5 An OR gale wiih iwo innui eveat*. 
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I- inure 4,7 A iwo input priority AND 



Figure 4,6 A two inpul AM} gate. 



4.4.3 Priority AND Gate 

This is logically equivalent to an AND gate with an exception that its input 
events must occur in a specified order. A two input priority AND gate is 
shown in Figure 4.7. 

In this situation it is supposed thai the event /4j must occur before event 
A 2 . The development of a mathematical expression for the gate is pre- 
sented tn reference 31. 



4.5 A FA LIT TREE WITH REPEATED EVENTS 

This type of situation is illustrated in Figure 4.8. The alphabetic letters in 
the diagram represent the fault events; A t , Aj, A 3 , and C indicate the basic 
fault events; J?,, B 2 . B 0 . the mean intermediate fault events; T the top 
event. 

The fault tree shown in Figure 4.8 can be represented by the Boolean 
expressions as follows: 

T-CB a (4 J) 

B Q =B X B 2 (4.4) 
B t m(A^A 2 ) (4.5) 
B 2 ^(A^A 3 ) (4.6) 
By substituting expressions (4.4) and (4.6) in expression (4.3) we gel 

T=C M.+y^FM.-M,) (4.7) 

It is clearly shown in Figure 4.8 that ihc event A } is the repeated basic 
rauli event Therefore, the expression (4.7) has to be simplified by applying 
the basic Boolean algebra properties. 
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Flgurr 4J8 A bull Cree with repealed events. 



Basic Boolean Algebra Properties 
I. Laws of absorption: 



2, Identities: 



3. Distributive laws: 



A + {A B) = A 
A(AB)=AB 

A+A=A 
AA-A 



(4.8) 
(4.9) 

(4.10) 
(4.H) 

(4.12) 



A+BC-(A+B)(A + C} 
By applying distributive law of expression (4.12) to expression (4.4) we get 

B 0 = Ai+A 2 A } (4.13) 
By using expressions (4.10) and (4.11) in (4.3), expression (4.7) reduces to 

T=C[A x + A 1 A y ] (4.14) 
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Hjjure 4.9 A simplified fault <rce. 



Because of expression (4.14) our original fault tree of Figure 4.8 reduces id 
the one shown in Figure 4.9. 

Therefore, it is always recommended to reduce the repeated event 
expression by applying the Boolean properties before obtaining the 
quantitative reliability parameter results. Otherwise, the quantitative results 
will be misleading. Algorithms to obtain repealed events free fault tree are 
presented in references 1, 27, 33, 66, and 12. One such algorithm is 
presented in the section to follow: 



4& AN ALGORITHM TO OBTAIN MINIMAL CIT SETS 



A difficult problem associated with the fault tree technique is to obtain 
minimal cut sets of a fault tree. Here wc present an algorithm developed in 
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references 33, 27, and 1. Olher computer oriented algorithms may be found 
in references 66. 12. and 9. 

Before we present this algorithm we would like to present the definitions 
of both cut set and minimal cut set. These definitions arc taken from 
reference t. 

A Cut Set. This is a collection of basic events whose presence will cause 
the top event to occur. 

A Minimal Cut Set. A cut set is said to be minima] if it cannot be further 
minimized but still insures the occurrence of the top event. Minimal cut 
sets are sometimes called the minimal failure modes of a system. 

The algorithm presented here can be used manually for simple faidl 
trees. However, for a complex fault tree with hundreds of gales and basic 
events, it has to be computerized. 

The algorithm is quite efficient. The main features or this algorithm arc 
that the AND gate always increases the size of a cut set, whereas an OR 
gate increases the number of cut sets. These facts will be self-explanatory 
in the following solved example. We thought a solved example will be 
more useful to understand the practical aspect of this algorithm rather 
than presenting the background theory on the topic. Therefore, the readers 
interested in the theoretical background should consult reference 33. 

Example 3. The fault tree of the hypothetical example is shown in Figure 
4.10. The gates are labeled as GT and the basic events as numerals. This 
algorithm begins From the gate below the lop event in the example. It is 
labeled as GTO. As we know from our past basic knowledge on faull trees, 
the top event gate may normally be AND or OR gale. 

However, if the top event gale, GTO. is an OR gate then each input to 
the OR gate represents an enlry for each row of the list matrix. Whereas, in 
the case of an AND gate, each input represents an enlry for each column 
of the list matrix. 

For example, as shown in Figure 4.10. the top event gate. GTO, is an OR 
gale, therefore, we begin the formulation of ihc list matrix by listing inputs, 
GT1 and GT2 (output events) in j single column but in separate rows as 
follows: 

;"Y""i — Step 1 

i GTl j 
: CT2 | 



Any one input of an OR gale will cause the occurrence of an output 
event. Therefore, ihe inpuls of the GTO are the members of separate cut 
sets. 
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Figure 4.1ft An rvrnl Int. 

A simple rule to follow to develop this technique is to replace each gale 
by its inputs. The inputs may be ihc outputs of gales or basic events until 
all ihe fault tree gates are replaced with the basic evenl entries. Al this 
stage the list matrix is fully completed. 

For this example, to obtain a fully constructed list matrix we now 
replace the OR gate GTl by its input events as separate rows, as indicated 
below by the dotted line. The dotted line is marked as step 2: 

1 ^ Step 2 



CT3 
2 



GT2 



Fault rw and Common Cause Faihtrn 



Similarly, replace. GT2, by its input events as indicated by the dotted line 
marked as step 3: 



1 

GT3 
2 



Sup 3 



GT4 
GTS 



In similar fashion, we proceed with the gate, GT3. It is an AND gate, 
therefore, it is replaced with its input events as indicated by the following 
dotted line marked as step 4; 

1 Step 4 | 

! 3, GT6 ■ 

! i 

I 

GT4 
GTS 

Similarly, GT4, the AND gale, is replaced by its inputs marked as step 5: 

I 

3.GT6 

2 Steps 

I ., ^s^* 

i 4,5 j 
GTS 

Since GTS is an OR gate, it is replaced by its input events 7, 8 shown as 
step 6 below: 

I 

3.GT6 
2 

4, 5 Step 6 



8 
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input events 8 and 9 (marked as step 7) as follows: 

I 
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3,8 
3.9 



2 N Step 7 

4, 5 
7 
8 

As shown above in the list matrix, the cut set 8 is a single event cut set. 
Therefore, eliminate cut set (3. 8 ) to obtain the following minimal cut sets: 

1 

2 
7 
8 

3.9 
4.5 










Similarly, the gate, GT6, is also an OR gate; therefore it is replaced by its 



Figure 4.1 1 \ repealed event fault tree. 
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Finally, if there is no repeated event in the list matrix then ihc cut sets 
generated by this method will be minimal cut sets. It this is not so. then 
eliminate the nonminimal cut sets {i.e.. which contain other sets} from the 
final list matrix. 

flic reduced Liuli tree i I die above list matrix is drawn in Figure 4.1 1. 
This is a repeated event free fault tree. Now one may proceed to obtain the 
quantitative measures of the lop event. 



4,7 FAULT TREE DUALITY 

To reliability engineers it may be of great interest to obtain the dual fault 
tree. For example, in the case of top event "A system does not fail" is the 
dual of "system failure," Generally the occurrence of the top event is of 
interest more From [he system safety view point to the safety analyst The 
case of nonoccurrence of top event, may be of more interest to the 
reliability analyst. 

As words "occurrence" and "nonoccurrence" of a top event suggest 
duality, it is simple to obtain a "success tree" from a "fault tree," To 
obtain a success tree (i.e,. dual of a fault tree) replace all AND gates with 
OR gates in the original fauli tree and vice-versa. In addition, the top. 
intermediate, and basic fault events are to be replaced by their correspond- 
ing duals (success events). In other words, the occurrence events with 
nonoccurrence events. For example, if the top event was "room dark" then 
it is to be replaced with the top event "room lit," 

The minimal cut sets of the original fault tree will be minimal path sets 
of the dual fault tree (success tree). A path set may be defined as a set of 
basic events whose nonoccurrence contributes to the nonpresence of the 
top event- tn the case of a minimal path set, it is defined as a set that 
cannot be further reduced and still retains its path set characteristics. The 
algorithm presented in the previous section to obtain minimal cut sets of a 
fault tree can be applied to obtain the minimal paih sets of the dual fault 
tree. 



4 M PROBABILITY EVALUATION OF A FAULT TRLL 

Once the minimal cut sets or the redundancy free events of a fault tree are 
obtained, then one can proceed to evaluate the probability of the top 
event. However, before we proceed to evaluate the probability of a fault 
tree, we will review the basic concepts of probability laws as applied to 
logic gates. 

Two basic operational laws of probability are presented by solving OR 
and AND gales. 
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Figure 4.11 A two-iiipul OR fyiie. 



4.8.1 OR Gate 

To explain the OR gate probability concept we are analyzing a two input 
OR gale as shown in Figure 4.12. For Figure 4.12, the probability expres- 
sion for the top event is given by 

P(T) = P(a) + P{b)-P(a-b) ( 4 - ,S > 

If a and * are statistically independent events and P{a)P(b) is very small, 
then the above expression (4.15) can be approximated as 

P(T)~P(a}+Pib) (4-16) 

In the case of * number of inputs OR gate, the expression (4.16) may be 
generalized to. 

p(«**+e+ • )^P(a) + P(b) + P(c)-r ■ ■ ■ (4.17) 

The above approximation is good if the summation of expression 
(4 17) is very small, which implies that the basic event probabilities 
P(a) P(b) /»(<■),■• - are very small. However, expression (4.17) yields ex- 
aci result if events a. b t c,- • • are mutually exclusive. The exact expression 
of (4.17) is presented in Section 4.12. 

4.8.2 AND Gate 

A two input events AND gate is shown in Figure 4.13. In the case of 
statistically independent events a and b, the multiplication ru|« of proba- 
bility are applied to obtain the following top event probability expression: 

P{ab)~ P{aY P(b) K-18) 
For n input AND gate, the above equation can be generalized as 



P(a b c- -)-P(a)- P(b) Pic) 



(4.19) 



m 
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4,13 A two-input events AND 



Example 4. Evaluate the top evcni failure probability of the fault tree 
shown in Figure 414. Assume, the basic events A, B, C. D, and E are 
statistically independent and P( A)<= P{ B ) = P{C)~ P[ D)=- P\ E )= J, The 
fault tree of Figure 4.14 shows that it does not have any repealed basic 
events. Therefore, the probability of occurrence can be evaluated at die 
output of each gate. However, if the repeated events in each fault tree were 
present then first of all one must eliminate the repeated events (i.e.. obtain 




Figure 4.14 A hypothetical event tree. 
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minimal cut sets of the fault tree), before taking the probability of 
occurrence at the output of each gale. 

The fault tree shown in Figure 4.14 can be solved by the following two 
different methods. 

Melhtxt I. Write the expression for the lop event in terms of basic events. 
Obtain the probability of occurrence of this expression as follows. The top 
event expression is given by 

where 

T 2 = CD (4.21) 

T,-TyE (4.22) 

T,-A+B (4.23) 

Hence. 

T 0 ~E(A + B) + CD (4.24) 

Therefore 

P[T 0 )-P{£A + EB + CD) (4.25) 

Now expression (4.25) can be expanded to obtain top event probability 
expression. If we assume the statistical occurrence of failure events then we 
can obtain the quantitative probability result of the top event. 

Method 2. This is an .ihernaiive method to obtain the quantitative value 
of the top event probability by calculating the intermediate events proba- 
bilities and then using these results to obtain the top event probability 
result. One must note here that we assume that the failure events are 
statistically independent. By using expressions (415) and (4.18), the inter- 
mediate and top even! quantitative results and expressions are as follows; 

P( T y ) = /»( A ) + P{ B) - P{ A )- P{ B ) = 1 /4 + I /4- I / 16=7/16 

(4,26) 

P{T 2 )*=PiC) P{D)** 1/4 1/4-1/16 (4.27) 
/ > (T l )-/ > (rj)-/ > (f) = 7/l6- 1/4 = 7/64 (4.28) 
P{ T 0 ) - P( T, ) + P{ T 2 ) - P{ T, ) ■ P( T, ) 

-7/64+1/16 - 7/64- 1/16- 169/1024 ( 4.29) 

.'. Probability of occurrence of top event= 169/ 1024 
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Figure 4. IS A fault tree with repealed evenl D. 



Example 5. Suppose in Figure 4. 14. the event E is replaced by event D as 
shown in Figure 4.15. To obtain the top event probability of the fault tree 
shown in Figure 4.15, we apply method I of the previous example. The lop 
event expresswn in terms of basic events {without eliminating the repeated 
event D) is given by 

T Q -(A + B)D+CD (4 joj 

Thus, 

T tt -DA+BD+CD (4.3 1 ^ 

By taking the probability of the top event, we get 

P( DA + BD + CD )- P{ DA >+ P{ BD ) + P{ CD ) - P( DA BD ) 

~ P{DACD)- P(BDCD) + P{DABDCD) 

(4.32) 
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The redundancy-free expression with statistically independent events is 
given by 

P( DA + BD+CD)- P{A)P{D) + P{ B)P( D) + P(C)P(D) 
-P{D)P(A)P(B)-P(A)P{C)P(D) 
- P( B)P{C)P( D)+ P(A)P( B)P(C)P( D) 

(4.33) 

.'. P{ DA + BD + CD)- 1/16+1/ 16-1- 1/16-1/64 
-1/64-1/64+1/256 = 37/256 

The probability of occurrence of the top event is 

37/256 

However, if one eliminates the repeated events first then the fault tree 
shown in Figure 4.15 reduces to the one shown in Figure 4.16. The top 
even expression for Figure 4.16 becomes 

7" D = DT, (4,34) 

where 

T^A+B+C (4J5) 




Ftgare 4.16 A repealed evenl tree (null irrr. 
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For the statistically independent events (4.34) and (435}. probability 
expressions arc given by 



P(DT,) = P{ D)P{T,)~ 37/64- 1/4-37/256 (4.36) 



where P{A + B + C)- P(A) + P{ B) + P(C)- P{ A)P( B)- P{A)P(C)- 
P{ B)P(C)+P(A)P{ fl)P(C) = 37/64 



4.8*} Concluding Remarks 

In cases where the basic event failure probabilities are very small, the 
inability to remove dependencies will not introduce a significant error in 
the end result [65]. However, one must try to remove all the dependencies 
in a fault tree before obtaining the final probability result. 



4,9 FAILURE RATE EVALUATION OF FAULT TREES 

This section outlines, how to obtain the failure rale of the fault tree top as 
well as the intermediate events. The following assumptions are made to 
develop this procedure: 

L The basic events (system components) are not repaired. 

2. The fault event occurrence time-, (or component failure times) are 
exponentially distributed 

3. The fault tree is redundancy free. In other words, it contains no 
repealed events. 

4. The basic fault occurrence or component failures are statistically inde- 
pendent 

The fault tree OR and AND gate failure rate expressions are developed 
by using the following relationship: 



where \(t ) = the failure rate (hazard rate) at lime / 
rt(r) = the component or sysiem reliability 



For the component constant failure rates, the OR and AND gates 
failure rate {hazard rate) formulas are developed in the following sections. 
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4.9 J OR Gate 

Logically this gate corresponds to a series system. A series system reliabil- 
ity can be obtained from the following equation: 

n 

R s = II R, (4.38) 

where /^-the constant reliability of ihe ;th component 
/t^—the series system reliability 
/t = the number of components 

When components failure times follows exponential failure laws, (4.38) 
becomes 

R s (0 = ex P -|2>^ (4.39) 

where A, = constant failure rate of the ;th component 
/ — the time 

Subslituting (4.39) into (4.37) yields the series sysiem hazard rate 

i 

A,(0=2\ <4.40) 

I'm I 

It can be recognized from the series sysiem failure rate equation (4.40| 
that an OR gate output is simply the sum of its inputs. 

4.9.1 AND Gate 

The AND gate corresponds to a logically connected parallel configuration 
sysiem. A parallel network reliability is given by ihe following equation: 

*,=i-nd-*j (4.41) 

I — I 

where R p = \he parallel system reliability 
n = the number of components 
fl^the ith component reliability 

In the case of components' constant failure rates, the above equation 
becomes 

II 

V'l-I- II (I -*"*'') (4.42) 
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where A, -the /th component constant failure rate 
t = the time 

After substituting (4.42) into (4.37), we gel the following [53] results: 

M/)-| 2M^-l)j| iU-l] ' (4,43) 

where l/^ = (|-e-»-') r or /- 1,2*3 n. 

Example 6. Evaluate top even! failure rate or Ihe fault tree shown in 
Figure 4.17. Tor a 100-hour mission. Assume 

X, =A 2 =A J = \ 4 ~A ; , "A t> = Av T =0.00l failure/hour 
By utilizing (4.40). the output event failure rates of OR gates GTI. GT3, 




Flgort 4.17 A hypothetical failure rale evaluation fault lre«. 



Failure Rale Eraiitation iff Faull Tree* 



GT4, and GTO arc evaluated as follows: 

A 70 =A r ,+A„(/)+A T1 = 0,00400S2 failure/hour (4.44) 

A r , - A , + A ;, = 0.002 fa i lure / hour (445) 

An-Xj + Aj-0.002 railure/hour (4.46) 

and 

A„=Aj + A 4 -0.0O2 failure/hour (4.47) 

Similarly, we utilize (4.43) to obtain the output event failure rate of the 
AND gale GT2 as follows for a 100-hour mission: 

A„(f >- !) + (*»- ^0.0000082 failure/hour (4.48) 

where z~ 1/(1 -e - * 1 ') for i = 7. 74. 

When an AND gate output event is an input event to another AND gate 
then the hazard rates of all the intermediate (including the top event) 
events can only be obtained from the reliability function of these events. In 
other words, the hazard rate or failure rate result obtained for an AND 
gate output event cannot be used as an input to another AND gale. 

If iwo or more AND gates are encountered in series, it is strongly 
advised to establish the reliability function at the output event level of each 
gate then apply ihe hazard rate formula of (4.43). 

Example 7. A two-AND-gatcs-m-scnes fault tree, shown in Figure 4.18, is 
required to compute the failure rate of the top event for a 100-hour 
mission. Assume A, -A ; = A 3 = 0.001 failure/hour and the basic failures 
are statistically independent. By utilizing (4.43) we get 

7 A 

Agti(') = r =0.00018 failure/hour (4.49) 

; + 1 

where ; = »/(!-*"*'). 

Gates GTI and GTO output event unreliability and reliability equations 
are given by 

Patiin = P [ (')P i (t) (4.50) 
KoToO-l-ZVO-^CWn (4-51) 

where /*,(/)« the unreliability of the eveni i at time t for /— 1,2,3 
J» CTI (/)«lhe unreliability of the gate. GTI output event 
/? CT0 {O"ihe reliability of the lop event 
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Figure 4.18 A faull tree with two AND gales in series. 



To obtain the top event hazard rate, substitute (4.51) into (437), Since 
A,»A 2 — Xj, we use (4.43) to obtain the following top event hazard rate 
result: 

3A 

a gtd( ' » - — =0.00002 failure/hour (4.52) 

where;- 1/(1 -e" x '). 



4,10 FAULT TRFF ¥XA\± A I IOIS UK REPAIRABLK 
COMPONENTS 

In this section we are concerned only with the fault tree evaluation of 
repairable components [22]. This type of Situation is frequently encoun- 
tered in real life where the system components are normally repaired 
whenever they fail. The method presented in this paper assumes that the 
component failures are statistically independent and that the component 
failure and repair rates are constant; in addition, the repaired system 
components are considered as good as new. Furthermore, this method is 
only applicable to the cases where one may be concerned with calculating 
the lop or intermediate event steady-state unavailability, limiting mean 
repair rate, limiting mean failure rate, and steady-state failure rate. Another 
major assumption of this mcihfxi is that it assumes that faull trees are 
redundancy free. (i.e.. no basic repealed events are allowed). 
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In most cases a redundancy-free expression can be obtained by applying 
basic Boolean reduction techniques. For certain cases it is simpler to 
obtain Boolean indicated cut sets (B1CS) [33] and then eliminate the 
redundant cut sets by inspection. It may also be useful to eliminate as 
many repeated (redundant) events as possible at the fault tree construction 
stage and eliminate the remaining ones with the Boolean reduction tech- 
niques. However, if some of the repeated everts are impossible to eliminate 
and if the probability of occurrence for basic events is less than 0,1 [65], 
the error generated in the end result will be either negligible or of very 
sma" magnitude. 

The main advantage of applying this technique is that, the original 
dependency free fault tree is unchanged; and the OR and AND gate 
steady-state unavailability, limiting mean failure rale, limiting mean repair 
rate, and limiting steady-slate failure rale formulas, can be applied directly 
to both the intermediate and top events of the faull tree. These formulas 
[128] for the OR and AND gates are discussed in the following sections. 

4.10.1 OR Gate 

This gate simply represents a series system with n nonideniical repairable 
components. The OR gate output event unavailability A s , can be obtained 
from the following equation: 

^, = 1-11 lW.) «- 53 > 

tex 

where J, = the unavailability of ihe repairable component i 
X—a set of n number of components 

For a repairable component with constant failure and repair rates the 
equation for the unavailability A may be expressed [129] as follows: 

J(f)-^ T <1-<r , * + >"> (4.54) 

(tTft 

where i - time 

A — the component failure rate 
jj = the component repair rate 

For large I. the above equation becomes 




(4.55) 



By substituting (4.55) into (4.53) we obtain for the scries system 



lex + 



(4.56) 
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Similarly, the Following OR gale output event, limiting failure rate, limiting 
mean failure rate, and limiting mean repair rate equations, is 

A„- {series system steady stale availability, (l -A,) } 

X {series system failure rate. \ jm ] 

= ( 1 -^.)2A i (4.57) 

tex 

where A„ is the series system stead} -stale failure frequency: 

*,m= 2 x . (4,58) 



where A JB -lhe series system limiting mean failure rate 

(series system steady-state availability. (I -A,)} xfsenes 
system failure rate, X J( J/{series system unavailability, A,). 



K 

(4.59) 



where P-.m " 5 lhe series system limiting mean repair rate. 

4.10.2 AND Gate 

An AND gate is the representation of a parallel system composed of n 
(number) of nonidentical components. Since the parallel system is a dual 
of the series system AND gate output event, steady-state unavailability, 
steady-state failure rate, limiting mean failure rate, and limiting mean 
repair rale equations can be obtained directlv from (4.56), (4 57) (4 58) 
and (4.59): 

where p denotes a parallel system. 

2 M^,) (4.61) 

imX 

V«" (4.62) 
v ~ A p 

and 

firm- 2 (4.63) 
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Similarly, the respective equations, in lhe case of a m-oul-of-n identical 
inputs AND gale are 



J-Bl 



(n-m)\(m-\)\ (l/ M ) 



A" (4.65) 



(«-m)'(«.-D!i(;)(l/A)'(l/t<r 



and 



< 467 ' 

(n-m)\(m-\)\ 2 , (!/*)'( W~' 

ll is easily seen from lhe above equations that, for identical inputs OR and 
AND gate, lhe equations are special cases of the m-out-of-n inputs AND 
gate equations. In the case of an AND gale, m takes on the value of the 
number of inputs to thai AND gate, whereas in the case of an OR gate, m 
is equal to unity. 

Lxumptc S Suppose the objective in Figure 4.19 is to obtain the top event 
Iteady-slate unavailability, steady-stale failure frequency, limiting mean 
failure rate, and limiting mean repair rate- Assume that all of the basic 
events of the fault tree have the same failure and repair rate respectively, 
that is. \ = 0.001 failure/hour; and u-0.05 repair/hour. Furthermore, 
assume that all of the basic events arc statistically independent. 
From (4.55) the single component steady-stale unavailability is 

0OOI „0.02 



X + p. 0.051 

In the case of an OR gate output event GT1. lhe unavailability from (4.53) 



From (4.57), 



-0.0019 failure/hour 
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A « 0.0*04 
* ,„ - 0 002D4 fa.lure.'hour 
f , m ^O.M8»rBU»ir/heur 
>.„ = Q 00195 l^ilurfl, 'hour 




Equation 4.58 yields 

A fm = 0.002 Tailure/hour 
The limiting mean repair rate is obtained from (4.59) 

j- 1 

= 0.0475 repair/hour 

Similarly, Tor the AND gate output event, GT2, and top event OR gate 
GT0, the following information was calculated from (4.60), (4.61), (4.62). 
(4.63), (4.56), (4.57). (4.58). and (4.59), respectively. In die case of an AND 
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gate GT2, 



^=0.0004 

K p ,-0.00004 failure/hour 
\ pm - 0.00004 1 failure /hour 



and 



,1^ = 001 repair/hour 
Similarly, in the case uf the top event OR gate. GT0 
^=0,0404 

A .,, = 0.00195 failure /hour 
X w = 0.00204 failure/hour 



and 



fam 0.0489 repair/hour 



4.11 LAMBDA TAU METHOD 



This is another method that lakes into consideration the repair of the basic 
components. The Lambda Tau technique requires redundant-free expres- 
sions from the fault tree diagram. In other words the basic events of the 
tree must not be repeated events. In many cases it may be obtamed by 
Boolean substitution reduction techniques. However, this method incorpo- 
rates many other restrictions. The Lambda Tau method calculations for an 
AND gate are based on ihc coexistence of all failures, and the calculations 
for an OR gale are based upon at least one failure among n number or 
possible failures. The basic formulas for the AND and OR gate parameters 
are derived in flow research references 124 and 65- The mam restrictions of 
this technique are (a) r/T is small, where t is repair lime of a component 
in question, where T is the time interval of interest: (b) the basic event 
failure rates are very small; <c> the product of the failure rate and repair 
lime is very small (i.e., must be less than 1 ); (d) the product of the failure 
raie and tH« mission lime is very small {i.e.. must be less than I preferably 
0.1); (e) the failures and repair rates are constant; and (f) failures occur 
independently. 

The basic formulas for reliab.lity of the AND (AND Priority) OR gates 
are derived in reference 126. AND and OR gate parameter formulas are 
presented in the following sections. 
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4.1 LI AND Gate ( Coexistence of A U Faiiur, y > 

The general formula of ihe probability, /» AND , thai n failures coexist in a 
small time interval, di For the first time can be obtained: 



'and" II fcj 



II T„+T,iy ■ t„+. ■■ rm\" t„_, 



(4.68) 



where n is the number of components and A, is the constant failure rate of 
the rth component. 

The AND gate output event hazard rate (failure rate) and repair time 
equations are given by 



i-J l»\ 

/^even 



n- I 

J I r 4 



(4.69) 



and 



2'- 



(4.70) 



It is emphasized thai (4.68). (4.69), and (4.70) are only valid for assump- 
tions outlined in the earlier section 



4.1 1.2 OR Gate (At Uast One Failure Amonf; u Possible Failures) 

This gale represents a system with n components connected in a scries 
configuration. The probability that one or more failures occur is 



The OR gate output event failure rate and repair lime are 

n 

/-J 

and 

IK 



(4.71) 



(4.72) 



(4.73) 
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a{a + lint: * M 









( ■ il 



>. ■ D.001 Iailure.''h43ur 
r * 5 hours 




A * 0 001 failure/hour i ' 0.001 lailurff-'huur 
r ■ 5 hour* i - E hours 



6 




i, - 0.001 fiilurirhour I - 0.001 fakin/nour 
r - & hourt r » 5 hours 

Figure 4.20 A Fault tret cunlainifig repealed events. 



As for the AND gate output eveni formulas, these equations are only valid 
under the assumptions outlined in the sections earlier. 

Example 9. A fault tree containing a repeated event is shown in Figure 
4.20. Assume the occurrence of basic fault events is statistically indepen- 
dent then obtain 1 he tup event quantitative measures of the l-ambda Liu 
technique. 

As it can be realized from the fault iree that the repealed event A has to 
he eliminated before we can apply the Lambda Tau technique to compute 
quantitative reliability measures. 

The output event expression of the gale, GrTh can be simplified by 
applying the following Boolean identity: 

A(A + B) = A (4.74) 

Therefore ihe simplified fault tree of Figure 4.20 becomes as shown in 
Figure 4.21. To determine the top event quantitative measures of the 
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\ - 0.001 failure/bout X - 0.00! failure/hour 
X - 5 hour* t - 5 houri 

HfUrr 4J1 A Simplified fjiull tree 



reduced fault tree shown in Figure A2\, the OR gate. GT2, output evenl 
faUurc rale and repair lime are obtained by using (4.72) and (4,73) 

^cn ■ 2 k =0.002 failure/hour (4-75) 
t C7 - 1 -t = 5 hours (4.76) 

To obtain the quantitative measures of the top event, failure rate and 
repair lime, use expressions (4.69) and (4.70). respectively: 

A o;ro = ^>i,n( f A + r m ) -0.00002 failure/hour (4.77) 



r CT2 T A 



tmo - ; '— = 2 ■ 5 hours (4 .78 ) 

vn ' a 

For a 100-hour mission, the top event probabHitv that n failures coexist in 
time interval dt for the first lime is 

= 0.00002 X 100= 0.002/mission (4.79) 
where X Gro is obtained from (4.77). 
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4.12 REPAIRABLE COMPONENT FAULT TREE EVALUATION 
WITH KINETIC THEORY 

This is another method to evaluate reliability indices of the fault trees with 
repairable components. Before applying the kinetic theory, the minimal cut 
sets of a fault tree are to be determined. This approach was originated in 
reference 68. In this section assume thai the component failures are 
statistically independent The major steps to be followed for this technique 
are outlined below: 

Step I. Construct the fault tree of a device, a subsystem, or a system in 

question. 

Step 2. Determine minimal cut sets of the constructed fault tree. 
Step 3. Develop each primary event information of a minimal cut set faull 
tree. 

Step 1 Similarly, develop each cut set information of a minimal cut set 

fault tree in question. 
Step 5. Finally, evaluate the fault tree top event information. 

To obtain basic cut sel and lop events quantitative reliability information 
the following notations are used. 

Basic Events. 

A= the constant failure rate of the basic event or component 
/i = the constant repair rate of the basic event or component 
r — mission time 
F(tl= probability of a component failed condition at time / 
/>(')" probability that a component has its first failure by time / 

If- probability that a component fails or a basic faull event occurs in 

time interval [/, / + A/] 
H',= probability thai a component has its First failure in time interval 
[l. t+At\ 

Cui-Seis. 

A'(r)= the cul set failure rate at time / 
H'(i)= die cul set repair rale al time / 

Top Event. 



A r (0- the lop event failure rate at time I 
fi r (/)= the lop evenl repair rate al time / 
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AB + AC + O 




Figure <.22 A hypothetical faull 



lii a detailed form the kinetic tree theory is described in reference 69. Here 
we simply deal with the practical aspect or this theory, with assumptions 
that the basic failure rale, and repair rales are constant and the failures are 
statistically independent |130|. To demonstrate the practicality of this 
approach, the following hypothetical example is presented in Figure 4.22. 

Example 10. For the fault tree shown in Figure 422, it is necessary to 
obtain the lop event unavailability and failure and repair rate information. 
For the fault iree shown we develop the required information [130} for the 
basic events, cut-sets, and lop event. One should note here that the 
constructed faull tree has no repeated events. 

basic failure event information. Assume that a repairable component 
has constant failure and repair rates; therefore, by applying the Markov 
process concept we obtain the following differential-difference equations 
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Fipirv 4 J3 A tingle component slate space diagram. 



for the component operational and failure states as shown in Figure 4.23. 

A{t + &t)-(l-\bt)A{t)+pMF(') (4.80) 
ff(f+At)-(J- (l ^)f^/)+XAj>*(0 (4.81) 
In the limiting case the above equations become 

^ll--kA{t)+tiF(i) (4.82) 
dt 

*?±ll-- il FU)+\AU) (4.83) 
dt 

At .4(0)- I, other initial condition probabilities are equal to zero, where 
A(t) is the component availability at time / and F{t) is the component 
unavailability at lime /. By solving the above differential equations we get 

A toy m JL-*JL.g-&+# (4.84) 

A + U A + Jl 
A + fi \ + fi 



For large /, (4.85) becomes 



F=-±- (4.86) 



To obtain the failure probability at time f, set a-0 in (4.85): 



For small X/ the above equation may be approximated to 



(4.88) 
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Now. we define the following expression for the probability that a compo- 
nent fails in a time interval (/, / -#- Af J: 

m>)At=[l-F(t)]\to (4.89) 

One should note that (4.89) is the second part of the right-hand side of 
(4.81). By substituting (4.85) into (4.89) for large t we gel 

W=W(')*>=^i\to) (4.90) 

Similarly, for a nonrepayable component (i.e., m=0> substitute (4.87) into 
(4.89) Tor small A/; we get 

»}-J^(f)-Af-AAr (4.91) 

cut set information. To demonstrate how to obtain the cut set informa- 
tion we will use the fault tree example shown in Figure 4.22. The top event 
expression of Figure 4.22 is composed of the following cut sets: 

Top event -AB+AC+D (4.92) 

Now. consider the cut set AB in (4,92), Here, we are interested to find the 
probability of first failure in time interval [t,i + to]. 

There are two possibilities to encounter failure of cut set. AB„ in a small 
interval to (here wc assume that onlv one failure occurs in a small time 
to): 



1. A is in failed slate and B Tails in to 

2. B is in failed state and A fails in to 

Thus we define, W A0t as the probability of first failure of the cut set AB in 
the time interval [t, l-hto). Therefore. 

WAM-FjT B + F 9 W d (4.93) 

In the case of repairable events A and B substitute (4.86) and (4 90) into 
(4.93); 

W - ^ ***** . 



and 
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Similarly, for cut sets AC and D we obtain the following expressions: 

W AC -F A W C + F C W A (4.96) 

W D ~W 0 (4.98) 

w >-txiiY* <499) 

To determine probability that the cut sets AB, AC. and D are in failed 
state one should multiply the individual event probabilities for the statisti- 
cally independent events. Thus, by utilizing (4.86). we obtain 



-F,F K - ^— - (4.100) 

F*c-r A r c - *£ c ., v 



Suppose, if event B of the cut set AB is not repaired, then we will denote 
with a small alphabetic letter, h. Therefore one may rewrite (4.93) as 

W Ab = F A W b + F b W A (4.103) 

In the case of repairable and nonrepairablc events A and b, respectively, 
substitute equations (4.91), (4.90). (4.86), and (4.88) into (4.103) to obtain 

^•-^' a ' 4 ' ,tlV) ;^ 4 ' < 4 • ,<)4, 



Hence, 



+ My* 



To obtain cut set failure rale we use (4,89) 
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Similarly, the cut set repair rate is obtained by using equation (4.107): 



Jffi (4.107) 

Since 

W-W(t)di (4.108) 

Now consider the cut set AB, since W AS is known from (4.95) therefore, we 
substitute (4.95) into (4.108) to get 

To obtain cut set AB repair rate, substitute (4.100) and (4.109) into (4 107) 
to get 

( i A»-^A + t i „ (4.110) 

Similarly, to obtain cut set failure rate, substitute (4,100) and (4 109) into 
(4.106) to gel 

In similar fashion, one can obtain the failure and repair rates for cut sets 
AC and D as follows: 

K-*/> (4.113) 

and 

^c-(M^ + ^c) (4.114) 



Hn=Pp (4.115) 

top event information. To obtain the top event probability informa- 
tion, one has to take advantage of the union of the minimal cut sets, since 
the occurrence of any one of the cut sets will cause the lop event to occur 
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The probability of the union of the top events is given by 
P(T 1 +7' 2 + ■■ + 7J=[/'(r i ) + P{r 1 )+ - w. terms 

- p<7\r I )+/Hr,r J )+- +/>( r, t\ «-(;)t*rms 



+ P(T l T 2 T,) + P(T l T 1 T A H<<+p[T l T } T t ) (^lenns 

■■(D-^PiTfr'^)] .-{^)term (4.M6) 

Consider now the following lop event minimal cut set expression of Figure 
4.22: 

T-AB+AC+D (4.117) 

The probability expression of ihe above expression becomes 

F(AB + AC + D)-F(AB) + F(AC) + F(D)-F{ABC)-F{ABD) 

-F(ACI» + F{ABCD) (4.117) 

For statistically independent events 

^top = F{AB+AC+D)-F A F f +F A Fc+F 0 -F A Fg F c -F 4 F B F D 

-F^Fo+F^F^t, (4.U8) 

Now consider (4.1 17); it contains event A. which is common to both cut 
sets AB and AC, The occurrence of this common event A will cause the 
simultaneous failure or cut sets AB and AC, if the component A fails in 
interval |r. f + A/]. 
The probability expression or this intersection is given by: 

W, SC -W A F B F C (4.L19) 
Therefore by substituting (AM) and (4.90) into the above expression we get 
w M«*tM„ Al (4l 20) 

When obtaining, W TOf >. one should be careful that it is composed of two 
states: 



1- All cut sets are operating at lime t. 

2. A cut set Tails in a lime interval [f, / + Af]. 
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Thus we write an expression for *f Top as follows: 

"W- KaO-F c )(l-F p )+ W AC { I - F a )( I - F D ) 

+ W D {\-F A F lt ){\-F A F c ) 

-^abcU-Fo) (4.121) 
The term W D { \-F A F B Y.\-F A F c ) of (4.121 ) becomes in simplified form 
W t M-F A F B -F A F c -F A F B F c ) 

The top event failure and repair rates. X mp and /i TOP , for Figure 4.22 may 
be obtained by substituting (4.121) and (4.1 18) into the following expres- 
sions: 



and 



4.13 ADVANTAGES AND DISADVANTAGES OF THE FAULT 
TREE TECHNIQUE 

Like any other technique, the fault tree technique has its advantages and 
disadvantages: 

Advantages. 

1. k provides insight into the system behavior. 

2. It requires the reliability analyst to understand the system thoroughly 
and deal specifically with one particular failure at a time. 

3. It helps to ferret out failures deductively. 

4. It provides a visibility tool to designers, users, and management to 
justify design changes and trade-off studies. 

5. It provides options to perform quantitative or qualitative reliability 
analysts. 

6. This technique can handle complex systems more easily. 
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Disadvantages. 

|. This is a costly and time-consuming technique. 

2. Its results are difficult to check. 

3. This technique normally considers that the system components are in 
either working or failed state. Therefore, the partial failure states of 
components are difficult to handle. 

4. Analytical solutions For fault trees containing stand-bys and repairable 
priority gates are difficult to obtain for the general case. 

5. To include all types of common-cause failures it requires a considerable 
effort. 

^4.14 COMMON-CAUSE FAILURES J 

As the field of reliability engineering is becoming a recognized discipline in 
engineering so is the awareness of associated problems such as common- 
cause failures, which were overlooked some years ago. In recent years the 
common-cause failures have received widespread attention for reliability 
analysis of redundant components, units or systems, because the assump- 
tion of statistical-independent failure of redundant units is easily violated 
in practice [93]. It may easdy be verified from reference 116. This paper 
reports frequency of common-cause failure in the U. S. power reactor 
industry: *'Of 379 components failures or groups of failures arising from 
independent causes, 78 involved common-cause failures of two or more 
components." 

A common -cause failure is defined in reference 105 as any instance 
where multiple units or components fail due to a single cause. Some of the 
common-cause failures may occur due to: 

h. Equipment design deficiency. This includes those failures that may have 
been overlooked during the design phase of the equipment or system, 
and may be due to the interdependence between electrical and mechani- 
cal subsystems or components of a redundant system. 

^1. Operations and maintenance errors. These errors may occur due to 
improper adjustment or calibration, carelessness, improper mainte- 
nance, etc. 

f 3. External normal environment. This includes causes such as dust. dirt, 
humidity, temperature, moisture, and vtoration. These may be the 
normal extremes of the operating environment. 

External catastrophe. This includes natural external phenomena such as 
flood, earthquake, fire, and tornado. The occurrence of any one of these 
events may affect the redundant system at a plant. 
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JS. Common manufacturer. The redundant equipment or component pro- 
cured from the same manufacturer may have the same design or 
fabrication errors. For example, the fabrication errors may occur due to 
use of wrong material, wiring a circuit board backward, poor soldering, 
etc. 

J 

6. Common externa! power source. A common-cause failure may occur due 
to the common external power source of the redundant equipment, 
subsystem, or unit. 

Functional deficiency. This may oc«ur due to inappropriate instrumen- 
tation or inadequacy of designed protective action. 
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There are several examples of common-cause failures in nuclear power 
systems [114J. Some spring loaded relays in a parallel configuration fad 
simultaneously due to a common cause. Furthermore, due to aj mainte- 
nance error of incorrectly disengaging the clutches, two motorized valves 
are placed in a failed statejln addition, ajsteam line rupture causing 
multiple circuit board failures is another example. The common cause is 
the steam line rupture in this case. In some cases instead of triggering a 
complete redundant system failure (simultaneous failure), which is the 
extreme case, the common cause may produce a less severe but common, 
degradation of the redundant unit. This will increase the joint probability 
of failure of the system units. It may be due to harsh accident environ- 
ment. In this degradation slate, the redundant unit may fail at a lime later 
than ihe first unit failure. Because of the common morose environment, 
the second unit failure is dependent and coupled to the first unit failure. 

Although the existence of common-cause failures has been recognized 
for a long lime, no concrete steps were taken to represent them svstemaii- 
cally until the late 1960s. Most of the literature on the subject is presented 
in bihlu.)graph\ on common-cause failures [93]. 

Some of the newly established theory and models to analyze common- 
cause are presented in this section. 



4. 14. 1 Common- C ause Failure A nalysis of Reliability Networks 

la this section we present a newly developed method |88. J01] to analyze 
active identical units with statistically independent and dependent (com- 
mon-cause) failures. However, this method may be extended to other 
reliability models and probability densities. To develop this method, it was 
assumed that each unit has a certain amount of common-cause failures 

Since from past experience [101 J it is known that the common-cause 
failures occur in real life, the parameter a is introduced into the newly 
developed formulas to include common-cause failures [88]. Ihe parameter 
a can be obtained from the operating experience data of the redundant 
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system or equipment 

□ — fraction of unit failures that are common cause 

The above parameter can be considered a point estimate of the condi- 
tional probability that a unit failure is common cause. A unit failure rate A 
can be considered to have two mutually exclusive components, \ t and X Jt 
that is, 

A = A,+A 2 (4.124) 

where A, - the unit independent mode constant failure rate. 

Aj = the redundant system or unit constant common-cause failure 
rate 

Since 

a=^ (4.125) 
.\A z <=aA (4.126) 
and A, can be obtained from (4.124) by substituting (4.126) 

.\A, = (l-a)A (4.127) 

The system reliability, hazard rate, and MTTF formulas as well as the 
graphical plots are developed for a parallel, fc-oul-of-n. series, and a bridge 
network as discussed in the following sections, 

A Parallel Network. The modified identical units parallel network is 
shown in f igure 4.24. It is simply a parallel network with a unit in series. 
The parallel stage (i.e.. labeled L "l") of 1-igurc 4.24 represents ;ill the 
independent failures for any n unit system. The series unit stage labeled 
"2" in Figure 4.24 represents all the common-cause failures of the system. 

The common-cause failure probability hypothetical unit is connected in 
series wiih the independent failure mode units. A failure of the hypothet- 
ical series unit (i.e., the common-cause failure) will cause the system 
failure. It is assumed that all the common-cause failures are completely 
coupled. The system reliability R r of the Hgurc 4.24 can be written as 

*„-{]-(! -*,)"}*; (4.128) 

where « = thc number of identical units 

A, — the unit's independent failure mode reliability 
Aj — the system common failure mode reliability 
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Figure 4.24 A modified identical unit* parallel network. 



For constant failure rales A, and \ 2 from (4.126) and (4.127) and for 
reliabilities «, and ft 2 , the (4.128) can be rewritten as 



V )={]-( L ->*')*}*-*' 



(4.1 29) 



where / is the time. 

The reliability plots of {4.129) are shown for n -2,3,4, in Figures 4.25. 
4.26, and 4.27. respectively. These plots clearly show the effect of common- 




Hgurr 4.25 A two- parallel-units reliability plot. 
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f ipirv 4 J* A three-parallel- units reliability plot. 



cause failure on the parallel system. As the value uf a increases, the 
reliability of the parallel system decreases. 

The parameter a takes values from zero to one. At « = 0, the modified 
parallel network simply acts as an ordinary parallel network; however, at 
a— I the modified redundant parallel system just acts as a single unit. 
What it means is that all the system failures are common-cause failures. 
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The system hazard rate can br obtained from 
, I dR{i) 

*' (f) kail < 4I30 > 

The modified parallel system hazard rate A„(r) is derived by substitutine 
(4.129) into (4.130): * 

A,</)-aA + » A (l~«)[j£iJ (4.131) 

where 



The hazard rate plot is shown in Figure 4.28. The MTTF can be obtained 
from 



MTTF=jH«(0<fl (4.132) 
The modified parallel system MTTF is obtained by substituting (4.129) 
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into (4.132). that is. 



■ <-'>'*'(") 
v™- 2, k U -u-w\ <4I}3) 



Example II. Using the following known data for a, A, and f. Compute 
the reliability of a two identical units parallel system: 

A = 0.001 failure/hour 

a -0.071 

t = 200 hours 

solution. The reliability of the two identical units parallel system subject 
to common -cause Failures — 0.95769, The reliability of the two units parallel 
system subject to independent failures only =0.96714. 



k-ou'-<?/-n System. The modified identical units A-out-of-n system has a 
hypothetical unit for the common-cause failure connected in series with 
the independenl failure mode k-out-of-n units. The series-connected hypo- 
thetical unit represents the system or unit common-cause failures. A failure 
associated with this hypothetical unit will cause the overall system failure. 
The modified *-out-of-n identical units system reliability. R kn , can be 
obtained from 



\ r-A J 



14 



where /?, - the unit independent failure mode reliability 

= the ft-out-of-rt identical uniLs system common-cause failure 
reliability 



For the constant failure rales A, and A : from (4.126) and (4.127), (4.134) 
can be rewritten as 

2 (^-"'-"^'(l -*-"-■>*'}-'*—*' (4.135) 

The graphical plots of (4.135) for 2-out-of-3 units, 2-out-of-4 units, and 
3-oul-of-4 units are shown in Figures 4.29, 4.30. and 4,31, respectively. As 
the value of a increases, the system reliability decreases for a small value of 
as can be verified from Figures 4.29. 4.30, and 4.31. 



too 
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Figure 4JS A 2-out-of-n unite reliability plot. 



The A-out-of-* system hazard rate k k „(t) and MTTF can be obtained by 
subsututmg (4.135) into (4.130) and (4.132), respectively, that is. * 



(4.136) 




Figure <J0 A 2-out-oM iiniu reliability plot. 
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Figure 4J1 A 3-oui-oM unit* relinbilily plot. 



where 



an J 



^{l-e-"-* 1 *'} 



MTTF= 2 (") 



r-k 



(n-r) (n- rX*i~r- 1) 



r-ra + a (r-i-a + 1)A 2!(r-rA-a+2)A 



{ n -r)(ri-f - l)(n— v— 2) 
3! 



I 



(r + 3 — ra — 2a)\ 



(4.137) 

Example 12. For the following given hypothetical values of A, /. and or 
calculate the system reliability of a 2-oul-of-3 units system: 

X =0.0005 fai lure /hour 

a-0.3 
f- 200 hours 

From (4.135) the reliability of a system with common-cause failures was in 
die order of 0.95772 as compared with the system reliability, 0.97455, with 
no common cause failure, 
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Parallel-Series Network, This system is composed of independent failure 
mode, identical units, and paths with a hypothetical common-cause failure 
unit. The modified parallel-series network reliability, R can be obtained 
from 

R r ={l-(l- R7 y} R2 (4.138) 

where m=the number of identical units in a path 
n=thc number of identical paths 

For the constant failure rates A, and \ 2 the above equation becomes 

> = []-< I - e -»--i*' (4 , 39) 

The parallel-series network hazard rate A„(0 and MTTF can be obtained 
by substituting (4.139) into (4.130) and (4,132) as follows: 



<7 m -l) 



\ r Jl) = a \ + mn (\-a)\—?-Z-L (4.140) 



where A - 1 /( 1 - *-"< 1 " ] a nd 



.i(7)(-i>- 



MTTF- / ' ■- (4 141) 

A Bridge Network. This system is composed of an independent failure 
mode identical units bridge network in series with a hypothetical common- 
cause failure unit for the bridge structure. If the hypothetical common-cause 
failure unit fails, the overall system rails. The modified bridge network 
reliability [127] can be obtained from 

*,-{l-2<l -« l ) 3 + 5(l-/t,)' , -2(l-/t l ) , -2(l-/f l ) 2 }j? J ( 4. l42 ) 

where R t is the reliability of the bridge network subject to common-cause 
Tai lures. 

For the constant failure rates A, and A z from (4.126) and (4.127), (4.142) 
can be rewritten as 

[l -2(1 -,.->")* -4-5(1 -e-")*-!^ - e -*<f 
-2<]-e") 3 ]e-"»' (4J43) 




where A — (1 -n)A. The reliability plots of (4.143) are shown in Figure 4.32 
for the varying values of parameter a. For the small value or A/, the bridge 
network reliability decreases as the value of parameter a increases. 

The bridge network hazard rate, A ft (r) and the MTTF can be obtained 
by substituting (4.143) mio (4.130) and (4.132), respective!} : 



(4.144) 



A fc (f ) = p\ + A(-%v s + 2Sv 4 -Uv i +4iT 1 +4v) 

-2i7* + Snr 4 -2i7*-2ff J 
l-2w s +5w 4 -2w 1 -2w : 

where ir-{J-e~'*') and 

2 2 5 2 

MTTF- + + - + . (4.145) 

(2-n)A (3-2a)A (4-3«)A (5-4„)A ' 

Example 13. Suppose an identical units bridge network has the following 
known values for its parameter A and a. Calculate the bridge reliability for 
200 hours, that is, 

A -0.0005 failure/hour 

a=0.3 
/ = 2(X) hours 
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SOLUTION. The reliability of the bridge network subject to common-cause 
failures was in order of 0.% [from (4,143)] as compared to the independent 
failure mode bridge reliability of 0.984 (i.e., for a— 0). 

4.14.2 A Common Cause Failure Availability Model 

To perform meaningful reliability analysis of a two nonidentical units 
system with common-cause failures, we present a model taken from 
reference 92. The foUowing were assumed to develop this mathematical 
model: 

1. Common-cause and other Failures are statistically independent. 

2. Common-cause failures can only occur with more than one unit. 

3. If either one of the active redundant units fails, the unit is repaired. In 
addition, when both units fad. the system is repaired. 

4. The common-cause unit failure and repair rates are constant 

When both units are failed, repair is dependent on the following three 
cases: 

Case I. The failed component replacements, repair facilities, and skilled 
craftsmen are available to repair both units. 

Case 2. The failed component replacements, repair facilities, and skilled 
craftsmen are available to repair one unit only. 

Cast J. Neither (2) or (3) is applicable due to nonavailability of the failed 
components replacements, tools, or skilled craftsmen. Further- 
more, it may be queuing at a repair facility. 

In Case I both units can be repaired simultaneously; however, in Case 2 
only one unit can be repaired at a time. For the last and final case (3) the 
units can only be repaired at the availability of the craftsmen replacements 
for failed components. 

The following notations and abbreviations were used to formulate this 
availability model: 

P 0 (l) = probability at time /, both units are operational 
^i(')" probability at time f. the unit 1 has failed and unit 2 is operational 
P^s)" probability at time t, the unit 2 has failed and unit 1 is operational 
^(0= probability at time I. the units 1 and 2 have failed 
^(0 = probability at time the failed component replacements and re- 
pairmen are available to repair both units 
A, -constant failure rate of units J and 2, respectively, for fa* 1,2 
u,- constant repair rate of units I and 2, respectively, for / — 1.2 
Uj- constant repair rate of units 1 and 2 



Common Cow Failures ,of 
a — constant rate of repairmen availability and components replace- 
ments 

0- constant common-cause failure rate 
f = lime 

mathematical Model. The system of first-order differential equations [129] 
associated with Figure 4.33 are 



(4.148) 



\r-i / (-i 



(4.150) 



At /^(O-l other initial condition probabilities are equal to zero, where 
the prime represents differentiation with respect to time t. 
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The following steady-state equations obtained from (4. 1 46)- (4. 150) by 
setting the derivatives with respect In lime i equal lo zero: 

-(SMflk+S^+^-O (4.151) 



-(\ 3 +u 1 )f, + />,a 2 + /' 0 X l =0 
- ( A , * u 2 ) P 2 + P 0 h 2 + fjU, -0 



5>,-i-o 

r-0 



Solving the above system of simultaneous equations yields 



A|+Mj (A,+u 2 )u 2 fij f» 2 Mj 



+ I 



where 



(4.152) 
(4.153) 

(4.154) 

(4.155) 

(4.156) 



(4.157) 



t ,M 3 + a A, / u,A, \ f a, A, \ 



^j<Ai+Mi)+a(A;+fii) 



A, 



(A 2 +u,)J», A,j> Q 
a/^Aj + a,) aA, 



M l (A 1 + a,)/», 
M : (A,+/i 2 ) 



M 2 u 3 



(4.158) 
(4.159) 
(4.160) 

(4.161) 

(4.162) 
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The steady-state system availability can be obtained from 

I 

system availability- 2 (4.163) 

1-0 



4.14.3 A 1-Oui'Cf ^i: G System With Duplex Elements 

This model incorporates stand-by duplex unit replacements and common- 
cause failures 191]. When the operational duplex system (contains two 
statistically identical units) fails, it is replaced by one of the ( N— 1) 
Standby duplex systems. Furthermore, this model incorporates a possibility 
(i,e.. to replace the Failed system) that the repairmen or special repair tools 
may be available or, alternatively, not available at the time of the opera- 
tional system failure. This type of situation occurs at a nuclear plant where 
a duplex system is replaced only when both units fail. 
The following were assumed lo develop this model: 

1. A duplex system has two statistically identical units. All but one of the 
duplex systems are cold standbys (units cannot fail). 

2. Common-cause and other failures are statistically independent 

3. Operational system is replaced only when both units fail, 

4. Operational units are independently identically distributed (i.i.d.X ex- 
cept for common-cause failures. 

5. A failed system is restored as good as new. 

6. Cold standby systems; standby units cannot fail. 

7. Failed duplex systems are never repaired, 

8. When a system Tails, two different possibilities are considered to replace 
it with one of the standbys: (a) repairmen and special repair tools are 
available; (b) repairmen and special repair tools are not available. 

The notation for Figure 4.34 is as follows; 
fl- total number of system states 

i = slate of the system: both units are good, i- 0,4,8,.. ..(/t — 2); one 
unit is good, one unit is bad, /- 1.5,9.. ..,(/t- 1); both units are 

bad, no waiting, (-2,6.10 n t repairmen or special repair tool 

wailing slate, i ' — 3, 7, 1 1 ( n - 3) 

A = constant failure (hazard) rale of a single unit 
P— constant common-cause failure (hazard) rate of the duplex system 
a — constant repairmen or special repair tool availability (hazard) rate 
(if" constant and replacement (hazard) rate of the failed duplex system 
when repairmen and special repair lools are available (for J m I); or 
not available (for y — 2) 
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0 P 2, is, 4 fi s *-3 *-t 0 • 
Figure JJJ Trnnsiiion diagram of lystem. The Mar denotes down states. 

The equations for the Figure 4.34 modd 1 129] are: 

P 0 (/)--(2X+£)^(O (4.164) 

^<r>— J»i« t C*^X— ^<#>X (4.165) 

^j(O-^- 3 (O0+A'*-4(O-(*,+«)/**-j(M (4.166) 

^-jCO-ZWOa-^/W) (4.167) 

^-.(>) = /\ ,(')jtx,+u 3 /» A _ J (;)-(2A+^)P i . l (/) (4.168) 



The above equations are valid for Jt-5,9, 13,...,(n- I). 

P,(t)-\ for i-0 
=0 ror all other; 
The prime denotes differentiation with respect to time /. 

n=(4/V-2) for N>2 
The Laplace transforms of the end result are 



(4.169) 



*V-i(*)- 



j+X 



(4.170) I 

(4.171) I 



(4.172) 
(4.173) 



Common Cause Failures Ml 

U) _£*^±£^ (4 , I74) 



To obtain the time domain solution, one should transform (4.171)- 
(4.176) for the known value of S, 



4.14.4 A 4-Lliat Redundant System with Common-Cause Failures 

This mathematical model represents a 4-identical-unit system with com- 
mon-cause failures [87] where system repair times arc arbitrarily distrib- 
uted. Therefore, the supplementary variable technique [123, 125, 126] is 
Used to develop equations for the model. 
The following were assumed to develop this mathematical model: 

1. Common-cause and other failures are statistically-independent. 

2. Common-cause failures can only occur with more than one unit. 

3- Units are repaired only when the system rails. A failed system is 

restored to like-new. 
4. System repair times arc arbitrarily distributed. 



The transition diagram is shown in Figure 4.35, 




Figure 4.35 Transition diagram of s>uem 
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The following notation was used to develop model equations 



; = slate of the unf ailed system: number of failed units, 

r = 0, 1,2,3 

Jm state of the failed system, >- 4 means failure not due to a 
common cause: j=4, cc means failure due to a common- 
cause , f 

^{0 = probability that system is in unfailed state i, at time i 
/>/>'. t) m probability density (with respect to repair time) that the 
failed system is in state,; ,ind has an elapsed repair time 
of .v 

fij{y\ ?;(r>= repair rate (a hazard rale) and pdf of repair time when 
system is in state j and has an elapsed repair lime of y 
constant common-cause failure rate of the system when in 
state i"; /tj-0 

constant failure rate of a unit, for other than common- 
cause failures, when the system is in state i; i=0, 1,2,3 
$= Laplace transform variable 

The equations for the model are 

'0 •'0 

(4.177) 

dP ( l ) 

j = 1.2,3 /? 3 =0 (4.178) 

ai — + — ^ — +i*Ay)Pj(y.i)-o 

j-4 or 4,cc (4.179) 

p 4 (0,/)=A 3 /> 3 (0 (4.180) 

Pa. „(0. t) = P o (t)P o + PA')P i +P 7 {t)0 1 (4.181) 
P,(0)=l Tor M) otherwise f,(0) = 0 

Pj(y>ty-0 for all / (4.182) 

Solving and setting up. simitar to the above equations are presented in 
reference 125. 
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The Laplace transforms of the solution are 



III 



(4.183) 



np(-3y)qj{y)dy for y-4 or 4 f cc 



4»p 



P<(s)- 



for i- 1,2,3 



\ s P^ S )[i-G 4 (3)] 



2 &/»<*) 



(4.184) 

(4,185) 
(4.186) 



To obtain time domain solution of the above equations, one should 
substitute the Laplace transform of the repair times density functions for 
G t (s) and G tfX (s) and then take inverse Laplace transforms of (4.183)- 
(4.186). 
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5.1 INTRODUCTION 

Computers are finding an ever-increasing number of applications. The 
expenditure on computer software is increasing faster than on associated 
hardware. One of the estimates indicates [I] that the annua) expenditure of 
the U.S. Air Force, for example., on computer hardware is $400 million and 
the corresponding expenditure for software is estimated at SI 500 million 
per year. This ratio of four to one is predicted to rise to nine to one J 5, 23]. 
With expenditures of this magnitude, it is natural that attention should be 
directed to the proper development of software for computer applications. 
One area on which considerable emphasis has been placed in recent years 
is software reliability. This has come primarily with the advent of large and 
complex hardware-software systems and the use of computers as the heart 
of real-time applications to control vital and critical functions. The unde- 
tected errors can cause system failures with catastrophic results and at the 
same time the size and complexity has increased making the process of 
debugging more difficult. Most of the work in the area of software 
reliability can be divided into the following three categories: 

L Writing correct programs to begin with. 

2. Testing the programs to take out the bugs. 

3. Modeling of software in an attempt to predict its reliability and possibly 
study the impact of related parameters. 

These three areas are discussed in this chapter. It should be pointed out 
that software reliability is in no way as highly developed as the discipline 
of hardware reliability. Several useful concepts have, however, emerged, 
and considerable work is still under progress. 

5.2 HARDWARE AND SOFTWARE 

The discipline of hardware reliability is considerably older than that of 
software. It is, therefore, natural to make a comparison between the two in 
an effort to apply the large body of knowledge of reliability engineering to 

m 
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software assurance. Several attempts [8, 13] have been made in this direc- 
tion. It appears that much can be learae<Jfrom_e8tabJi^ 
engineering in organization and control p roce dures but that there are 
significant dil'lereiue's when si comes to failure mechanisms, A hardware 
component is assumed to have failed, if its characteristics change beyond 
(he design values either by drift or catastrophic failures. A piece of 
software, however, does not fail. If a program does not do what it is 
supposed to do, it is because an error is present. The error has been there 
and when the segment of the program containing the error is energized, the 
error becomes manifest. This encountering of error may or may not cause 
a system to fail. Whereas ihe hardware undergoes a change at the instant 
of failure, the software is really the same as it was before the error was 
discovered. 

The hardware reliability of a system can be improved by using two 
identical components in a redundant manner. Two identical softwares, 
however, will be of little use in increasing the reliability since the same 
error will be exercised in both at the same lime. 

There is an important difference between hardware and software in 
regard to the relationship between testing and reliability. If the software 
could be tested for every conceivable input then theoretically it should 
never cause system Failure. The hardware, on the other hand, could fail 
even after having been tested in the most exhaustive manner, 

A question that may be asked is, "Can the failure behavior of software 
be regarded as random? 1 * A program basically maps the elements of input 
space into corresponding elements of output space [11]. A certain subset of 
the input space would produce incorrect output. If we knew the output 
behavior for every conceivable input and could predict the future inputs, 
then wc could predict the failures in a deterministic fashion. The properties 
of a large piece of software, however, may never be known completely, 
since it is almost impossible to test software for every conceivable input. 
The input to the software is also random. With the uncertainty associated 
with both the input and the software, randomness can be justified for the 
occurrences of errors. 



53 SOFTWARE RELIABILITY MODELS 

Reliability models can be used to predict the reliability when the software 
is put into operational use. Several models have been proposed [21] in the 

literature, and a few are described here. The software reliability models use 

t he information of the number of errors debugged ounn g tltt? qe veaopment ~J 

of a software pjpgram^ Tms information is used to characterize the model y 

pirametej^ that can tii gnbeliseoHo pfedicTTfie number of failure s or som e 

other measure of reliability in the future. The software reliability can be 

defined as the probability of a given software operating lor a speciiieaTtae— 
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appropriate machine. 



$■3-1 Sho&maa Model 

The model proposed by Shoomar. [17,18] is based on the following 
assumptions; ™s 

1. The total number of machine language instructions in the software 
program is constant. 

2, The number of errors at the start of integration testing is constant and 
decreases duectly as errors are corrected. No new errors are introduced 
during the process of testing. 

3. The difference between the errors initially present and the cumulative 
errors corrected represents the residual errors. 

4, The failure rate is proportional to the number of residual errors. 

Based on these assumptions [17], 

where *- debugging time since the start of system integration 

m- errors present at *-0. normalised by the total number of 
machine language instructions 

= V 

-£"<,- number of initial errors 
/-total number of machine language instructions 
^-cumulative number of errors corrected by x, normalized 
Oy / 

eAx)- residual errors at x, normalized by / 
^ Assuming failure rate to be proportional to residual errors (assumption 

hMmXM*) {52) 
where f- operating time of the system 
K, ■ constant of proportionality 
A^O" failure rate at time t 



• exp 



^jf'A>,(jf)Arj. (5,3) 
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Since the hazard rate is assumed independent of / in this model, this 
assumption amounts to a constant failure rate, and therefore, 

MTTF = 



i 

K t e t (x) 



(5.4) 



Estimation of Model Parameters. By substituting for e r (x) from (5.1) into 
(5-4), the expression Tor M 1*1 ']■ can be written as Follows; 

MTTF 



*,l>(°)-«r<*)] 

I 

K,[E 0 /l-e c (x)] 



(5-5) 



There are two unknowns in (5,5), and £ 0 . These parameters can be 
estimated using the moment matching method [20]- Considering two de- 
bugging periods je, and jc 2 such that x t <x lt 



T, I 



and 



15.6) 



h. ! (57) 

where 7*,, T; = system operating times corresponding to x , and x 2 , respec- 
tively 

rt 2 = number of software errors during jl, and x 2 , respectively 



From (5-6) and (5.7) 

y-i 

where y — ■ — 



'1 "1 
MTTF 



MTTF, 



MTTF^lhe mean lime to software failures corresponding to de- 
bugging time .v f . 
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By substituting for £ a from (5.8) into (5.6), 
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(5.9) 



An alternative method for estimation of £„ and K is by using maximum 
likelihood estimates and is discussed in reference 17. 

$3.2 The Market? Model 

This model [19, 22] assumes the system to go through a sequence of "up" 
and "down" states. The system slate is termed il up rt if the first error since 
the start of integration and testing has not yet occuiTed or if the system has 
been restored after an error and the next error has yet to be encountered. 
The down state implies that an error has occurred and has not been 
corrected. The state transition diagram of this model is shown in Figure 
5,1, in which the following notation is used: 

1. Stale (n-k ) indicates that k th bug has been corrected and that (k + I)th 
error is yet to occur. This is the up state following the down state due to 
the *th bug. 

1 State (m-k) is entered when the (k+ l)th bug is discovered This is the 
down state due to (k+ |)th error, 

3. \ t is the error occurrence rale when the system is in slate (n-k). 

4. fi k is the error correction rate when the system is in down stale (m-k). 

5. Pjil) is the probability of the system being in state / at time t. 

The state differential equations Tor this system can be easily formulated 
using known methods [20]. 



The initial conditions are 

^_ A {0)-0 k- 1,2,3... 

Up HUBS 



(5.10) 
(5.11) 

(5.12) 
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Figure 5,f The Martov model. 
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in 



(5.U) 



A restrictive solution of (5.10) and (5.11) assuming constant A A ™A and 
constant ft k =fi is derived in [22]. This constraint on X k and fi k is, however, 
easily seen to be unrealistic. A more general solution can be obtained using 
numerical techniques like Euler's and Runge-Kutta methods for integra- 
tion. Once the state probabilities have been obtained, unavailability is 
calculated as [ 1 9, 22] 



Jk-0 



(5.14) 



The probabilities will depend on the choice of k^^. By choosing kg^ 
large enough, U(t) can be made close to the irue value of U(t). 

5.3.3 Jetinski-Moranda Model 

This model [14,15,21] like the Shooman model assumes an exponential 
probability density function for software errors. The hazard rate is as- 
sumed to be proportional to the number of remaining errors, that is, 

W-0 = */,w[£o- ('-D] (5.15) 

where A' JAf = constant of proportionality 

jc^time between the / th and (i - l)sl errors discovered 

The reliability function and the mean time to failure can be obtained 
[14] from (5.15): 



/?(/ / )=exp[-AT JM (£ 0 -i+I)/,] 



(5J6) 



and 



MTTF 



-1 



*,„(£„-/+ I) 



5.3.4 Schick Wtdrerivn Mvdel 



(5.17) 



The Schick Wolverton model [24] assumes ihe hazard rate proportional to 
the number of remaining errors and the debugging lime: 



* JW (0-*x W [£o-('-1)k 

where ,^,— time interval between the (/— I )st and ihe /th error. 



(5.18) 



The reliability function is 
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/!(/,.) =exp 



= cxp CL (5.19) 



-J[" > ™p[-if^(£ (> -;+])/, I /2j* f 



0 



(5,20) 



ft could possibly be argued, both for and against having the hazard rale 
proportional to debugging time. Probably the only way to judge the 
suitability of this model is by fitting it to the experimental data. 



5.4 MODEL VALIDATION 

Four models have been described in this chapter and several more have 
been proposed in the literature [21]. In addition to the models described in 
reference 21, Bayesian models have also been proposed [10], The true 
worth of a model can be measured by its ability to predict. Most of the 
discussion on the relative worth of the models is generally based on 
intuition and logical consistency. Because of the scarcity of data on 
software errors and lack of consistency in the available data, only a few 
attempts have been made Tor the experimental validation of these models. 
One such attempt has been reported in the literature [2lj. wherein a 
comparative study of the four models described in this chapter and five 
more models is described. 

The error data used by Sukert [21] came from Software Problem Reports 
{SPR's) during the software development of a large command and control 
system. The software was written in Jovial J4 code and consisted of 249 
routines with a total of 115,000 lines. Although some internal tools such as 
static code analyzer were used by the contractor for software development, 
no techniques like structured programming were used- 

The data was restructured so that each entry corresponded to a single 
error and to delete entries due to nonsoftware errors. The data was then 
sorted according to the date of opening an SPR so as to provide a 
sequential time frame suitable for input to the models. The data on CPU 
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time was not available from this project. A day was considered as the basic 
unit of debugging interval length. 

Because of the unavailability of CPU data, the Shooman model could 
not be used. The other three models along with some modifications of 
these and several other models were compared and the following conclu- 
sions were drawn [21], 

1. The Jelinski-Moranda and Sehick-Wolverton models consistently gave 
higher predictions for the number of remaining errors than the actual 
number, that is. the prediction is conservative or pessimistic with these 
models. 

2. For small software projects or where the testing phase is short, Jelinski- 
Moranda and Sehick-Wolverton models appear to give a reasonable 
prediction for the number of remaining errors. 

3. Of all the mudcls studied, Sehick-Wolverton, or a modified version of 
Jelinski-Moranda models appear to give the best prediction for the 
remaining errors for large projects or projects with a long-testing phase. 

ft should be remembered that even though this comparative study has 
produced some useful results, many more studies of this kind are needed. 



5.5 SOFTWARE RELIABILITY ASSURANCE AND 
IMPROVEMENT 

Modeling is only one aspect of software reliability and is intended to 
predict the number of bugs remaining in the system by using the statistical 
information on discovering and removing the errors. There are, however, 
two equally and perhaps more important areas of software reliability. 
These additional areas can be described as (a) designing for reliability, and 
(b) testing for reliability assurance. These two topics are discussed in this 
section. 

&5.I Designing for Reliability 

Probably the best way to have reliable software is to minimize the number 
and severity of bugs while a software package is being developed. There 
do es^not appear to be any p roven best way of producing reliable softwar e. 
Therejs^ a fi y e t no theoretical- framework fonieehnjoues for turning out 
error-free software. Ho wever, there appears to be an emerging consensus 
that certain program structuring and management techniques are con- 
ducive to developing reliable software. These techniques are usually re- 
ferred to as siructured programing and several techniques related to it. 
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Structured Programing. Several definitions of structured programing are 
floating around. A more generally accepted definition of this approach 
appears to be coding that avoids use of GO TO statements and program 
design that is TOP DOWN and modular. These three features appear to 
enhance program reliability, readability, and maintainability. 

Top Down Programing. There are basically two ways of program design, 
bottom up and top down. The classical way of willing large programs Is 
b ottom up. In jhisapproach. the program manager views the project as a 
whole, determines j^s^iejn-^ccllsjisg^and^lhen speeHTes the components 
needf^orjh^sjifljtfaffl The interfaces are specified and Ure~refflpbn eiu 
softwares are allocated_to individual proflTamffrE jnT jpvpJnpniftui Each 
programer is responsible for testing his subassembly or module before it 
goes into integration. The modules are integrated level by level by the most 
capable member of the group whose modules are being integrated. This 
manner of software development is similar to the one used for hardware 
development. 

It appears, howeve r, that an altemativejyjpr pach o f software develop- 
ment in the t op down ramme r gl^^icSIreHable software [2]. Here the 
chi ef programer programs inst ead of providing supervision alone. The core 
of the system is writ ten first assuming dummy suba^embly at the next 
leyjeTj iiese subasse mblies are developed nej^^irj_|ijkewise manner. In 
com paring these two approaches, an anal ogy^. with_the chief surgeon is 
o fieri drawn. The top down app roach Jsjike the best surgeon doing the 
most important or fundamental surgery himself and coordinating the less 
essential work performed hy rtTh^T^r— 

GO TO Free Coding. Dijkstra published a note in 1968 in CACM [6] 
entitled "Go To statement considered harmful." The title of this communi- 
cation seems to have had a wide-ranging effect on contemporary program- 
ing techniques. The GO TO statement does not create errors by itself. It is 
the transfer of control that can create meshing of the flow of logic so that 
the code can become difficult to read. The avoidance of GO TO state- 
ments, on the other hand, creates more transparent and readable code. The 
GO TO free programs are also more straightforward to prove. 

Now if it were conceded that GO TO statements should not be allowed 
in programing, what is the alternative? It has been shown [4] that any flow 
chart can be constructed uiing only the single entry exit structures shown 
m Figure 5.2. These three control structures can be used to write programs 
that will be free of GO TO statements and in which the program text will 
correspond more closely to the program execution [3]- This is best il- 
lustrated using an example. Figure 5.3 is an example of a program written 
using GO TO statements and the same program written using the control 
structures of Figure 5.2 is shown in Figure 5,4. ft can be appreciated that 
whereas the code in Figure 5.3 jumps around the page, the one in Figure 
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Flpuv 5 2 Single-entry single-exis logic siruciufES. 



A 

IF ft GOTO 10 
D 

GO TO 30 
10 E 

GO TO 30 

3Q (J Figure S3 Program coniaming GO TO stanmenis. 

A 

if B THEN E 
ELSE D 

Q Figure 5.4 Program of Figure 5.3 without GO TO slalMnenla. 



5.4 follows a sequencing process. Imagine a large piece of software written 
in the manner of Figure 5.3; it could be hard to follow and readily 
understand, whereas the code written in the manner of Figure 5.4 is more 
transparent. Such a code is not only readily understood bul the programer 
is less likely to make errors. 

Modular Programs. Programers normally break down a complex software 
development task into separate modules, A module is used by many other 
modules. This also, however, increases the potential for misunderstanding 
and errors. It appears that this source of errors can be minimized if every 
module is entered only at the top and left at the bottom. 

Related Idem. The techniques of structured programing and the concepts 
of modularity and tup down flow have been described. These techniques 
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have been found of much value in minimizing the number of errors in 
writing programs [22 j. There are other related ideas like development 
accounting and the concept of a program librarian, which merit considera- 
tion and the reader may find references 17 and 22-23 of interest. 

5.5.2 Testing 

Even with the best programing techniques, it is likely that a piece of 
software will contain some bugs. The purpose of testing is to assure tbat 
the program performs according to the specification design. Test tools have 
been developed to assist in assessing this assurance in computer programs. 
These tools basically provide some numerical measure of the thoroughness 
with which the testing was conducted. Several such tools are available and 
a comparative investigation into the effectiveness of these tools is reported 
in reference 16. 

The test tools consist of the following basic modules: (a) instrumentation 
module, (b) analyzer module. 

The source program of the module under test is first submitted to the 
instrumentation module, which inserts additional statements into the mod- 
ule. These additional statements are called sensors and counters [I6J and 
the process of adding these statements is called instrumentation. The 
functional intent of the original code must remain unchanged during the 
process of instrumentation, that is, the sensor and counter statements must 
not change the functional objectives of the program. 

The instrumented package is compiled in the usual manner and the 
object package is executed with its test data, which results in an instrumen- 
tation data file in addition to the normal output. This instrumentation data 
file and the Instrumented source File are then submitted to the analyzer 
module, which produces a report indicating the behavior of the module 
during execution. The following type of information is contained in such a 
report: 

1. The number of times each statement has been executed. 

2. At each branch point, how many times a particular path has been 
taken. 

3. Time for executing each statement. 

This information is useful in checking the structure of the code. It provides 
confidence in the logic and code of the program by ensuring that each 
statement and each branch path has been executed at least once, it is also 
possible to ensure tbat each subroutine has been called once. As can be 
inferred from the description of the testing process, these test tools can be 
very useful in discovering and reducing sequencing and control errors 
which account for approximate!}' 20 percent of (he total number of errors 
[16]. These structural analysis tools, however, do not test the timing and 
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data relationships- In addition to structural testing, functional testing is 
also necessary if the software is time critical. 
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6,1 INTRODUCTION 

The concept of constant failure rate is used to evaluate electronic compo- 
nent reliability. This concept is derived from the bathtub hazard rale belief 
that the failure rate remains constant during the useful life of electronic 
components. However, this is not normally the case when evaluating 
mechanical component reliability. It is an established fact in many cases 
lhat the mechanical components follow an increasing failure rale pattern 
that is generally represented by the exponential hazard function . 

The field of mechanical reliability is relatively new as compared to the 
electronic reliability. The in-depth effort in this field appears to have been 
started since the early 1960s and may be credited to the U.S. space 
program. During those years, the failure of mechanical and electro- 
mechanical components was one of NASA's (National Aeronautics and 
Space Administration) prime concerns. For ex ample, due to a mechanical 
failure caused by a busting high pressure tank, the SYNCOM T is believed 
to have been lost in space in 1963. Another typical example is the failure of 
Mariner III in 1964. It is also believed to have been lost due to a 
mechanical failure. There are several other instances where systems had 
mechanical failures. The researchers in the field felt that the design 
improvements were needed to improve reliability and longevity of mechan- 
ical and electromechanical components. Therefore, the space agency spent 
millions of dollars to test, replace, and redesign components such as fillers, 
pressure switches, pressure gauges, mechanical valves, and actuators. 

In 1965 NASA [80] initiated some major research projects entided: 

1. Reliability demonstration using overslress testing. 

2. Reliability of structures itiul components subjected to random dynamic 
loading. 

3. Designing specified reliability levels into mechanical components with 
lime-dependent stress and strength distributions. 

Ever since many publications on the subject have appeared. An up to dale 
but selective literature on the subject is listed at the end of this chapter. In 
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addition, a comprehensive literature survey up to 1974 on structural 
reliability is presented in reference 63, 

At present, the most acceptable way of predicting mechanical compo- 
nent reliability may be by applying the interference theory. This approach 
is well documented in references 49 and 50. The topics presented in this 
chapter are as Follows: 

1. Statistical distributions in mechanical reliability, 

2. Fundamentals of mechanical reliability. 

3. Mechanical equipment basic failure modes, 

4. Theory of mechanical failures. 

5. Safety indices. 

6. Load factors, 

7. Design by reliability methodology. 

8. Interference theory models. 

9. Reliability optimization. 

6.2 STATISTICAL DISTRIBUTIONS IN MECHANICAL 
RELIABILITY 

This section presents failure distributions useful for representing the failure 
behavior of mechanical components. As compared to other distributions 
the extreme value distribution is the most likely candidate for the failure 
behavior of mechanical components. Its examples are presented in refer- 
ences 28 and 44. 

The distributions discussed in the following sections are closely related 
to the reliability evaluation oF mechanical components: 

6.2,1 The Exponential Distribution 
The probability density function is represented by the equation: 

/=Aexp(-AO />0 A>0 (6.1) 

where / is Lime and \ is the constant failure rate. 

The reliability function R and hazard rate z of [he exponential distribu- 
tion are: 

/J-expf^Af) (62) 

and 



(6.3) 
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This distribution is widely used in reliability engineering. One of the 
reasons for its widespread use is its simplicity in performing reliability 
analysis- Its validity to represent a real-life failure data was first presented 
in reference 19. 

6. 2.2 The Extreme Value Distribution 

The density function / of this distribution is defined by 

/ = expl t ) exp{ - esp( / ) ) - cc <f< co (6.4) 

where t is time. The eslremc value reliability and hazard rate functions, 
respectively, are 

K-exp{-exp(0] (6.5) 

and 

r=exp(f) (6.6) 

This distribution was first used to analyze flood data by Gumbel [28]. 
Therefore, it is sometimes known an the Gumbel's distribution. The failure 
behavior ol many mechanical components may be represented by this 
distribution. From more fundamental considerations, this distribution can 
be developed by considering a corrosion process [66]. 

6.2.3 The WeibuU Distribution 

The Weibull density function is given by 

j=p\t*- l e- K '' for p>0 A>0 r>0 {6.7) 

where X = the scale parameter 
/?— the shape parameter 
r ~ time 

WeibuU reliability and hazard functions are 

R=e~ Xl ' (6.8) 

and 

z = p\tt- 1 (6.9) 

This distribution was developed by Weibull [99]. who described some of its 
applications. Ball bearing failures applications are given in reference [64], 



134 



Mechanical Reliability 



The exponential (/J= 1) an d Raleigh (0-2) are the special cases of thi s 
distribution, * 



6,2.4 The Mixed Weibuit Distribution 

This distribution was first presented by Kao [43J. He applied it to measure 
reliability of electron tubes. The probability density function is defined as 

♦'WfrM-(*r) <-> 

for /J|,ft>O,O<a l <I, ( i I >l,0:>o,O<rt<: I 
The reliability expression, for the above density function is 



~ exp (i)"]- {i -^['-^{-(xr}] (6n) 



6-X5 The Gamma Distribution 
The gamma probability density function is defined as 

r7/?7~ — for x>0 0 >o ' >0 

where r(j9)- f" 
■'o 

/J- the shape parameter 
A = thc scaJc parameter 

The reliabUity and hazard rate expressions are 

R - [ JT V - ' exp( - hx ) ^ ] A Vr( JB ) {6. 1 3) 

and 



(6.14) 



This distribution is an extended version of the exponential distribution. It 
was applied to the life test problems by Gupta and Groll [26], 
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The gamma distribution is related to the exponential and Chi-squared 
distributions. For its applications one should consult reference [57). 

6J.6 The Log-Normal Distribution 

The probability density Function is 

/ ! exp- (■■"'-",'-">' (6.15) 

hit t>$>0 

where p. ™ is the mean 

a- the standard deviation 

The reliability and hazard rate expressions for the above function are 
gjven by 

K_ ! f™ — ! — jUdi 

V2~*a J < 

for t>& ( 6l6 ) 



and 



z = 



\\l(t-&)\e-^'-»-rtjl<t {m 



Normally, the hazard rale of this distribution is an increasing function 
of time followed by a decreasing function. The hazard rate approaches 
zero for initial and infinite times. A representative example of this distribu- 
tion is the failures due to fatigue cracks. 



6.1? The Fatigue Life Distribution Modets 

These distribution models were presented by Birnbaum and Saunders [9], 
who proposed two-parameter distributions. The main applications of a 
"family of distributions arc to characterize failures due to fatigue. 
The probability density function is defined as 



/= 



2V2lrVAf 2 (i/A) l ' 2 -(A/f) 1/I 



exp 



la 1 U ' I 



for i>0 a,\>Q 
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where a and X are the shape and scale parameters, respectively. Readers 
requiring in depth material on these distributions should consult references 
27, 44, and 91. Other hazard rate models are presented in references 167 
91], 1 ' 

6J FUNDAMENTALS OF MECHANICAL RELIABILITY 

Like any other field of reliability engineering, mechanical reliability is also 
a joint responsibility of design and reliability engineers. A reliability 
engineer augments the designer's knowledge with design review procedures 
and statistical analysis; however, the designer still remains the key person 
to ensure component or system reliability. 

The old concept of merely good design practices is not satisfactory to 
ensure reliability of a complex system. Reference 72 lists several reasons 
for the discipline of mechanical reliability. 

1. Lack of design experience. Changes in technology are quite rapid and the 
mechanical designers no longer have the time to master the design 
especially when a complex equipment is designed for use in aerospace 
or military applications, 

2. Cost and time const minis. Because of the cost and lime involved, the 
designer cannot learn from past mistakes. Jn other words the eut-and-try 
approach cannot be used. 

3. Optimization of resources The workable design is no longer considered 
sufficient. The design must be optimized subject to constraints such as 
reliability, cost, weight, performance, and size, 

4. Stringent requirements and severe environments. Because of large-scale 
investments in developing systems to be used under severe environ- 
ments (military and space) the reliability problem becomes important, 

5. Influence Jrom electronic reliability. The vastly improved techniques for 
predicting electronic component reliability also stimulated similar devel- 
opments in mechanical engineering. 



6.4 MECHANICAL EQUIPMENT BASIC FAILURE MODES 

Unlike electronic components, the mechanical components have numerous 
failure modes. Some of the basic failure modes pertaining to mechanical 
equipment are fatigue, leakage, wear, thermal shock, creep, impact, corro- 
sion, erosion, lubrication failure, elastic deformation, surface fatigue, radi- 
ation damage, spalhng, corrosion wear, delammation, and buckling. These 
basic failure modes are described in detail in reference 72. Some of these 



fiietny of Failures 



failure modes may be associated with the following; 

1, Leakage and distorted flow failure modes are associated with the fluid 
flow equipment, 

2, The principal failure modes associated with a structural system are 
fracture, and excessive deflection. 

3, The overheating and reduction of efficiency may be categorized as the 
thermodynamic system failure modes. 

4, Bearing seizure and reduced accuracy of relative movement pertain to 
the kinemetic systems, 

5, Incorrect material properties and incorrect component geometry arc- 
called ihe material conversion failure modes. 
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When the strength of a material, a component, or a device is less than the 
stress imposed on it, the failure occurs. Stress and strength are defined as 
follows: 

Stress- A stress (load) tends to produce a failure of a component, a 
device, or a material. The term "load" may he defined as mechanical load, 
environment, temperature, electrical current and so on. 

Strength. Strength is defined as the ability of a component, a device, or a 
material to accomplish its required mission satisfactorily without a failure 
when subject to the external loading and environment. 

Both stress and strength may be described by probability distributions. 
All types of stresses and strengths cannot, however, be represented by the 
existing distributions. 

Because of the variation in material properties (e.g., production processes, 
geometric dimensions) the strength of nominally identical components 
subject to the same conditions may vary from component to component. 
The variability may be described by a distribution function. All the 
important variabilities (and their distributions) of a component must he 
considered and known (or assumed) when estimating the expected strength 
distribution function of a component. Ihe methods to predict the expected 
strength distribution from the variability distributions are presented in 
references 72, 1 1, 15, 96, 59, 100, and 76. 

It is always desirable to have a narrow spread of the strength distribu- 
tions because a narrow distribution yields a higher reliability than its 
counterpart, which is widely spread out and is of the same mean value. 
Therefore, efforts should always be directed toward obtaining a narrow 
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Figure 6,1 Interference thrary of stress-strength disiribuu'ort concept. 



strength dislribution; however, there should always be some degree of 
limitation to obtain such narrowness because the strength of a mechanical 
component or a material is generally reduced by fatigue, corrosion, wear, 
which are factors that increase the spread of the strength distribution. One 
should note that these factors lake time to become effective. Therefore, it 
must bs understood that the strength distribution is a function of time. 
Similarly, the stress distribution also changes under different conditions 
like use, maintenance, environment, and so on. The duty or the stress 
distribution for a component under controlled laboratory environments or 
conditions remains constant. 

If the expected distribudons of stress and strength can be estimated for a 
mechanical part, then by employing interference theory, the probability of 
failure or a mechanical part can be obtained. This concept is presented in 
detail in references 45 and 48-56. 

The concept of interference is illustrated in Figure 6.1. The unreliability 
or the probability of failure is represented by the shaded area in Figure 6,1. 

The interference theory is applicable only to those cases in which no 
significant changes occur in the item over the specified time interval. 
Furthermore, it is assumed that the fadure is dependent on the instanta- 
neous stress and not on the history of the stress. 

As mentioned earlier, the stress and strength distributions may change 
with time. To illustrate this point Figure 6.2a, b display stress-strength 
distributions for two different times f, and t 2 . For the sake of simplicity 
the stress distribution is assumed to be constant but the strength distribu- 
tion varies with time. Furthermore, the stress-strength distributions need 
not be symmetrical and may be skewed or irregular. Once the stress-strength 
probability density functions are known, reliability can be computed by 
applying the interference theory. 
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6.6 SAFETY INDICES 

The safety factor approach is a conventional design technique. This 
method uses safety margins and safety factors that are simply arbitrary 
multipliers. In some cases, these factors provide satisfactory design, if they 
are established from the past experience. In the days of modern technol- 
ogy, however, the new design involves new applications and new materials 
and more consistent methods are needed. The mechanical component 
design based entirely upon safety factors, could be misleading, and may be 
costly due to overdesign or could end up in a catastrophic failure due to 
underdesign. It is emphasized that whenever a designer makes use of safety 
factors these must be based upon considerable experience on similar items, 

6.6*1 Safety Factor 

There are several different ways of defining a safety factor as outlined in 
reference 55. In reference 10, the theoretical definition of a safety factor, is 
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^ = average value of failure governing strength, 
average value of failure governing stress, p.. 

This is a good measure particularly when both the strength and stress 
distributions are normally distributed. This factor in a mechanical design | s 
always equal lo or greater than unity. The concept of safety factor is 
illustrated in Figure 6.3. When the variation of stress and/or of strength 
is large, the safety factor becomes meaningless because the failure rate 
is positive, 

6.6,2 Safety Margin 
In reference 55 the safety margin is defined in [he following ways: 

(6.20) 



(6.19) 



or 



(6.21) 




Load Factor* 

where p.,— average strength 
Pram - maximum stress 

standard deviation of strength 

and 

M ra „ = P„ + *ff„ (6.22) 

where ft „ = mean stress 

o t ,** standard deviation of stress 

Normally, the value of k is between 3 and 6. It can be observed from 
discussion on safety margins thai (a) it is a random variable just like its 
counterpart, the safety factor, and (b) it presents the idea of separation of 
stress and strength mean values. 

Example L Suppose 

o tl - 200 psi, k = 4, q,= 900 psi 

H, -25,000 psi and /i Jf = 12,000 psi 

Find the safety margin For given data. By substituting the above informa- 
tion in (6.21), we get 

^ 25,000- (12,000+4x200) 
* m ~ 900 

25,000-12,800 122 
" 900 9 

i 

= 13.6 
6.7 LOAD FACTORS 

In the last decade or so it has been realized that the loads as well as the 
capacities of structures are not necessarily deterministic but are probabilis- 
tic, because of the random variation in magnitude and the random 
occurrence of loads. In this section we discuss the determination of load 
factors in the structural design. In references 78 and 85 this subject is 
discussed in detail. In this section we mainly deal with the dead and live 
loads. Earlier analysis on the topic were initiated by the authors of 
references 78 and 85. 




Mechanical Rrliability 

6.7,1 Deterministic Reststumx with Normally Distributed Loads 

For the deterministic resistance the design bad. D^ may be formulated as 
follows: 

D i- L <P4+L,tL t (6.23) 

where L d =- the dead load factor 
L | = the live load factor 
ft rf = nominal mean dead load 
fii™ nominal mean live load 

Suppose the live and dead loads are normally distributed random 
variables with mean values of ;i, and fi d , respectively. The design bad 
then also follows the normal law. In the case of independent dead and live 
loads, the design load, D f , may be described by (6,24). 

D \^(^+fi l ) + ^+of (6.24) 

where o rf , are the standard deviations of the dead and live load and c is 
the reliability coefficient for the combined dead and live load. Also, the 
design load in terms of component toads may be written as 

D l =m J +m | -( Mrf + c ' 0rf ) + ( /il+c ' 0| ) ( 62 5) 
where c" is the reliability coefficient for each load component, that is, 

magnitude of component dead load 
m | =■ magnitude of component live bad 

By manipulating (6,23), (6.24), (6.25), we obtain the following bad factor 
equations: 

m, 

— = I +SCY,, (6.26) 
& i m^mi+*tV l (6.27) 



where 
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where ^ is the coefficient of variation of dead load and V, is the 
coefficient of variation of live load. 

6.7,2 Normally Distributed Loads and Resistance 
When the resistance follows a normal distribution, (6.26) and (6.27) are 
modified to the following form: 

where V M is the resistance coefficient of variation. When V A is equal to 
zero, the resistance follows the deterministic law. The value of c can be 
determined from (6.30) when loads and resistance follow the normal 
distribution: 

where ti K is the mean resistance and c* is the reliability coefficient of the 
system. 

By substituting (6.30) into (6.28) and (6,29), we can determine the load 
factors for any desired level of reliability. Therefore the value of the c* can 
be obtained from the table of the error function. For example at the 
desired level of reliability, say H -0.9901, c* -2.33. For a solved numerical 
example see reference 86- 

6.8 "DESIGN BY RELIABILITY" METHODOLOGY 

The "design by reliability" methodology is described in considerable detail 
in references 50 and 49. To design an equipmen t or a component by taking 
reliability into consideration, the following steps arc needed: 

1. Define the design problem in question, 

2. List and identify all the associated design variables and parameters in 
the problem. 

3. Perform failure modes, effect, and criticality analysis (FMECA). 

4. Determine the failure governing stress and strength functions and 
distributions of a failure mode, 

5. Use failure governing stress and strength distribution to evaluate each 
critical failure mode reliability. 
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6. tlerate the design until the assigned reliability goals are met. 

7. Optimize design under specified constraints such as cost, weight, 
volume, reliability, maintainability, safety performance, and so on. 

8.. Repeat the above steps for each vital component or device of a system, 
9. Calculate the system reliability by applying the classical reliability 
theory. 

10, Iterate the design until the specified system reliability goal is fulfilled. 
Step A is probed in depth in the following section: 

6.8.1 Determination of Failure Governing Stress Distribution 

The following steps are to be followed to determine the failure governing 
stress distribution: 

1. List and identify all the important failure modes. 

2. In the case of a fracture failure mode, if any, determine the most likely 
locations where the combination of stresses are likely to act which may 
result in component failure. 

3. At each location calculate the nominal stress of components, 

4. Evaluate maximum value of each component stress with the use of 
necessary stress modifying factors. 

5. At each location combine all the stresses into the failure governing 
stress in accordance with particular failure mode being considered. 

6. In the failure governing stress equation determine each nominal stress, 
modifying factor and parameter distribution, 

1. Determine a failure governing stress distribution from the step 6 distri- 
butions. 

8. Repeat steps 2-7 for each significant failure mode listed in step I. 

Readers who require more information should consult references 51 and 
54. 

6.8.2 Determination of the Failure Governing Strength Distribution 

To determine failure governing strength distribution, the following steps 
are outlined: 

1. Set up the failure governing strength procedure by taking the failure 
modes into consideration. This criterion should be based upon the one 
used to determine failure governing stress. 

2, Evaluate the nominal strength. 
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3 Use appropriate strength factors to modify nominal strength. This is to 
' convert the nominal strength obtained under the standardized and 

idealized test conditions, 

4 Determine the nominal strength distribution, or each modifying factor 
' and parameter associated with the failure governing strength equation, 

5. Establish the failure governing strength distribution by utilizing the 
normal distributions of step 4. 

For more detailed information regarding the determination of the failure 
governing strength distribution, the interested reader should consult refer- 
ences 49 and 50. 



SS RELIABILITY DETERM IN ATIOM — CONSTANT 
STRESS-STRENGTH INTERFERENCE THEORY MODELS 



This section deals with situations in which the stress and strength are 
represented by well-defined probability density functions. Furthermore, 
the stress-strength distributions are not time dependent. 

When the probability density functions of both stress and strength are 
known the component reliability may be determined analytically. Reliabil- 
ity is defined as the probability that the failure governing stress will not 
exceed the failure governing strength. In a mathematical equation it can be 
written as 

R = P{s<S)-P{S>s) (6-31) 

where R = the reliability or a component or a device 
r*-ihe probability 
S=the strength 
-V = the stress 

Equation (6.31) can be rewritten in the following form: 



J - « L 1 



ds (6.32) 



where / j; (j) = lhe probability density function of the stress, s 
/ S / A (i')=the probability density function of the strength S 

Reliability for a single failure mode can also be computed from (6.33) on 
the basis 'thai the stress will be less than the strength: 



J jo J — « 



dS (6.33) 
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The above equation may be used to obtain numerical solutions if the 
analytical solution U difficult to obtain. In addition, when the empirical 
data is sufficient but the stress or strength distribution cannot be identi- 
fied, the graphical approach can be applied to obtain component reliabil- 
ity. 



6.9,1 Reliability C alculation by Graphical Approach 

This technique makes use of the Mellin transforms, which can be applied 
to any distribution. The Mel] in transforms of the reliability equation (6.32) 
are defined as 

j f 

= i ~ Jr s,i,(s) (6.34) 

and 

L = fj„(s)ds-Fj S ) (6.35) 
Equation 6.35 may be rewritten as 

*W„<*)<* (6.36) 
By substituting (6.36) and {6.34} into (6.32) we get 

R-f ] MdL (6.37) 

Obviously, L takes values from 0 to I. Therefore, if we plot (6.37), that 
is, M versus L, the area under the curve will represent the single failure 
mode component reliability. A typical plot of (6.37) is shown in Figure 6.4. 
Simpson's rule can be used to calculate area under the M versus L curve. 

Example 2. Suppose the strength of a component follows the Rayleigh 
distribution with known scale parameter value of 15.000 psi. Similarly, the 
stress follows a Weibull distribution with the shape parameter equal to 3 
and the scale parameter value of 12.000 psi. 
Therefore, the stress and strength density functions become 



Woo ( Tsfoo ) exp [ -(isioo)' 



(6.38) 



and 
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By substituting (6.38) and (6.39) into (6.34) and (6.35), respectively, we get: 



W=l-f 4 -rt(j) = «p 



y S \1 



(6.40) 



Table 6.1 presents tabulation for M and L for the various values of s. 
Figure 6.5 shows a plot of values for M and L from Table 6.1. Using 



Table 6.1 



J 


M 


L 


0 


1 


0 


2,000 


0.98 


0 005 


4,000 


0.93 


0.04 


6.000 


0.85 


0.12 


8.000 


0.75 


0.26 


10.000 


0.64 


0.44 


12,000 


0.53 


0.63 


14,000 


0.42 


0.8 


16,000 


0,32 


0.91 


18.000 


0.24 


0.97 


20.000 


0.17 


0.99 


22,000 


0.12 


0.997=: L 
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Figure 6 £ M versos L plot. 

Simpson's rule, the component reliability R is estimated from Figure 6,5 
0.25 , 

R - — [y a + + 2> 2 + } 

R-~{1 +4x0.75 + 2x0,61+4x0.46 + 0.12} 
r?-0.59S5 

Reliability calculation when stress and strength data can not be repre- 
sented by any existing distribution is discussed in reference 79. 

6.9.2 Analytical: Constant Stress-Strength Interference Theory Models * 

This section presents three interference theory models when probability 
density functions arc defined. 

Component Reliability Determination for Exponential Stress and Strength 
Distributions. Both stress and strength probability functions are defined 
as 

&<*)^%*~**^ 0 <*<<*> (6.42) 

and 

J&rtW-X*****®** 0 < S< oo (6.43) 
•For these models ii is assumed that the component has only one significant failure mode. 
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where A„ = the reciprocal of the mean value of stress, s _ 
\ Slh -lhc reciprocal of the mean value of strength. 5 

By utilizing expression (6,33), the component reliability, R t , can be de- 
termined: 



dS (6.44) 



where 



Therefore by substituting (6.45) into (6,44) we gel 

=\- r^e-^^ds 



+ Jo 



(6.46) 



Dividing numerator and denominator of expression (6.46) by ^ r we get: 
R r - 1 — r 1 for S~¥=0 

(6,47) 



I +p 



where p-s/S for p< L Values of * r are presented in Table 6,2 for 
the various values of p. A plot of (6.47) is shown in Figure 6.6. 

6.9.3 Component Reliability Determination when Stress and Strength 
Follow Rayleigh Distribution 
Both Rayleigh stress and strength density functions are defined as follows: 

fjs)^,^-*"'' 0<f<« (6,48) 

and 

jfo(S)-2***&r*«* 0<S<« (6.49) 
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Table 6.1 



p 


l+p 


„ 1 

** T+? 


1 


2 


0.5 


0.9 


is 


0.53 


0.8 


u 


0.56 


0.7 


1.7 


0.59 


0.6 


1.6 


0.63 


0.5 


1.5 


0.67 


0.4 


1.4 


0,71 
0.77 


0.3 


1.3 


0,2 


1 2 


0.83 


0.1 


1.1 


OS] 


0 


1 


1 



where = the stress parameter 
*S(A =llie strength parameter 

Component reliability is determined by substituting (6.48) and (6.49) into 
(6.32): 



ds 



r~i — i — i — i — i — r ~ i i 




0 1 



J — i — i — i — i — i i i i 



0 Q.I 0? 0.3 0.4 0-5 0.6 0.7 0.8 0.9 1 

P - 

Figure 6.6 Compcnefli reliability versus mean vtress-slrtnglh ratio. 
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Let 



I ds 
o 



la k s ,+ k sth 



6.9,4 Component Reliability Calculation with Normally Distributed Stress 
and Gumma Distributed Strength 

Stress and strength probability density Functions are defined as 

f„{s) = ' e -u-^/2o„ o<*<oo (6.51) 

<i„V2v 

and 

JsJ S ) = j^W- l r'* S 0 < ^ < « (6.52) 

where (i and X are the shape and state parameters, respectively, and jn Jd 
and a It are the mean and the standard deviation, respectively. By substitut- 
ing the probability density (unctions {6.5 ]) and (6.58) into (6.32) and 
integrating, the following reliability expression ts obtained, 

«-0 1-0 * • 

where 

| *- [**>-(*)«•-<>«] 

where J is the incomplete gamma function. 

I J / Hu-o?,^ 1 



'(s+1) VI 



/ n*t-*St 
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For the detailed derivation of (6.53) see reference 102. Many other inter- 
ference theory models to calculate component reliability are developed in 
references 45, 83, I01 T and 102, These models are developed for the 
following: 

1. Normally distributed stress and strength. 

2. Log-normally distributed stress and strength. 

3. Exponential I v (normally) distributed strength and normally (exponen- 
tially) distributed stress. 

4. Gamma distributed stress and strength. 

5. Wcibull distributed strength and normally distributed stress. 

6. Weibull distributed stress and strength. 

7. Wcibull distributed strength and extreme value distributed stress, 

8. Maxwellian distributed stress and Wei bull distributed strength. 



6.9.5 Component Reliability with Multiple Failure Modes 

Reliability of a component with many independent failure modes is given 

by 

*= II R, (6,54) 

r-l 

where R = \hc overall component reliability 

n — the number of significant failure modes 
jfij — thc reliability of a significant failure mode i 

Similarly , the system reliability can be computed for a series configura- 
tion, the com pone ii I re liability being obtained by applying (6.54) or di- 
rectly from the stress-strength models fi.e. t if the component under study 
has only one significant failure mode). 

6.9.6 Chain Model 

This model represents a situation in which a chain is composed of n 
number of identical series links subject to the same environmental stress 
[83]. The probability of any link having strength $ 0 or greater is given by 




(6.55) 



In the case of it number of identical and independent lints, the probability 
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that the chain has strength S 0 or greater is given by 



J' r" 
fa 
L S 0 



{S)dS 



iss 



(6.56) 



To obtain the probability density function of the chain strength, f (SlH ( 5), 
differentiate expression (6.56) with respect to S: 



r/ 3tk (s)ds 



n-1 



(6.57) 



When all the chain links are under the same environmental stress, the 
chain reliability R lh can be obtained by substituting (6,57) into (6.33): 



*c*-r\\f s M s > tb ]»\r f *'> {s) 



dS 



it-i 



^dS (6, 



58) 



Reliability of the above equation may be determined by graphical, analyti- 
cal or nunierid I icdniique. 



6, ft 7 Stress-Strength Jinte-Dependent Models 

In the previous sections, we considered stress-strength models where stress 
and strength were independent of time. In real life, however, this may not 
be necessarily true. The component strength may change with time and a 
component may experience repeated application of stresses. In other 
words, the stress or load may follow a random pattern with respect to time 
t. A hypothetical pattern is shown in Figure 6.7. 

This area of mechanical reliability still remains to be explored further. 
The interested readers are advised to consult references 10, 45, 84, and 87. 




Figure 6.7 A hypothetical random stress spectrum. 
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610 OPTIMIZATION OF MECHANICAL COMPONENT 
RELIABILITY 

A redundant system can be optimized subject lo constraints such as cost, 
weight, and volume. To optimize system reliability, traditional operations 
research techniques such as Lagrange multiplier, linear, integer, and dy- 
namic programming are applicable. These techniques can be used to 
optimize reliability of mechanical components, also, 

6.10.1 Reliability Optimization of a Mechanicut Component with 
Normally Distributed Stress and Strength 

The following reliability equation is taken from reference 45: 

*- f !— e-'^dy (6.59) 

where »-(S-fX«£*+«!»r ,/a 
s_= mean stress 
S— mean strength 
f Sj ft.o„ = standard deviations of strength and stress 

It is assumed that to formulate this model, the stress and strength are 
statistically independent. To maximize component reliability, it is obvious 
that the value of lower limit of expression {6.59) should be as low as 
possible. Therefore, the equation to minimize total cost subject to desired 
component reliability may be formulated as follows: 

minimize k - k,(S ) + k 2 { o Sl J + k I ) +* 4 ( 0jt ) 

(6.60) 

subject lo{S-s){o£ th + o},y in >y 

where k - the total cost 

*,(£)= the eost function of the mean strength (munotonically 

increasing function) 
k 3 (s) = the cost function of the mean stress (monotonically 
decreasing function) 
*2<°jrA) = the strength standard deviation cost function (mono- 
tonically decreasing function) 
k*i %t )' m 'he stress standard deviation cost function (monotoni- 
cally decreasing function) 
y = obtained by the coupling equation for the desired 
reliability level 
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The Lagrangian equation for the above problem becomes: 

F{S.s,a Slh .o lf .\)=k + \[S-s-y(al h +^) i/:t ] (6.61) 

To find optimum solution, differentiate (6.61) with respect to each variable 
& Sj.fc,,,*,,, and equate each differentiation to iero. The following 
equations were obtained: 

s-i->'<4*+^) l/2 -° <«*> 

k' 3 (s)-\ (665) 
k?{S)=-\ (6-66) 

where single overdots and primes represent parda! derivative with respect 
to a „ respectively, and double overdots and primes represent partial 
derivative with respect to a Slfl , 5, respectively. The value of S, s, a Sth . a 
and X can be found by solving <6.62)-(6.66) to obtain all local optima. To 
choose a global optimal solution, determine the objective function (6.60) 
for all the local optimal solutions. For a more detailed analysis and 
examples on the mechanical component reliability optimization, one should 
consult references 45 and 95. 



&11 concluding remarks 

Although the interference stress-strength modeling is a promising tech- 
nique for calculating the reliability of a mechanical component, there are 
several problem areas to be overcome. Some of these problems are out- 
lined as follows: 

1 The representative stress and environmental condition under which the 
component will operate may be difficult lo estimate with certainty at 
the design stage because of the lack of field data. 

2. Most of the material properties are time dependent. For some practical 
purposes this factor may be disregarded because or their slow change 
but generally, the time dependency can not be ignored- Due to the lack 
of variability data of material properties, further assumptions regarding 
time dependency may be required. 

3. Although there is no lack of mathematical techniques or the probabdis- 
tic models for the reliability evaluation, further refinement to these 



Mechanical Reliability 

techniques and models will be useful to improve die mechanical reliabU- 
ity prediction. 
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I Human Reliability 

7,1 INTRODUCTION 

Numerous systems are interconnected by human links. In the earlier 
reliability analysis, attention was directed only to equipment, and reliabil- 
ity of the human element was neglected 

Williams [94] recognized this .shortcoming in. the late 1950s and pointed 
out that realistic system reliability analysis must include the human aspect, 
Ever since the beginning of the last decade there has been a considerable 
interest in human-initiated equipment failures and their effect on system 
reliability. 

According to reference 50 about 20-30 percent of failures, directly or 
indirectly are due to human error. Furthermore, according to reference 19 
about 10-15 percent of the lota! failures are directly related to human 
errors. These are mainly due to wrong actions, maintenance errors, misin- 
terpretation of instruments, and so on. 

Subsequent work by others is listed in Section 7.10. This research deals 
mainly with the human error data banks, human error classification 
schemes, determining the significance of errors to system operation, human 
error allocation, and human reliability modeling in continuous time do- 
main. 

7.1.1 Human Reliability Definition 

According to reference 49, human reliability is defined as the probability 
that a job or task will be successfully completed by personnel at any 
required stage in system operation within a required minimum time (if the 
lime requirement exists). 

7.1.2 Human Error 

Human error is defined [19] as a failure to perform a prescribed task for 
the performance of a prohibited action), which could result in damage to 
equipment and property or disruption of scheduled operations. In real life 
most systems require some human participation irrespective of the degree 
of automation. Il is said that wherever people arc involved, errors will be 
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made. These errors occur regardless of their training, skill or experience 
Therefore, predicting equipment reliability without considering human 
reliability will not present a true picture of that reliability. 



7.2 HUMAN STRESS PER FORM ANCE EFFECTIVENESS 

According to reference 19, the human performance and stress follow the 
relationship shown in Figure 7.1. This relationship shows that the human 
error rate for a particular task follows a curvelinear relation to the imposed 
stress. At a very low stress, the (ask is dull and unchallenging; therefore 
most operators will not perform effectively and the performance will not 
be at the optimal level. When the stress is at a moderate level, the operator 
performs at his optimum level. The moderate level may be interpreted as 
high enough stress to keep the operator alert. At a still higher stress level, 
the human performance begins to decline. This decline is mainly due to 
fear, worry, or other types of psychological stress. It follows from Figure 
7.1 that at the highest stress level, the human reliability is at its lowest 
level. 



73 CONCEPT OF HUMAN ERROR 

According to reference 33. a human error occurs if any one of the 
following happens: 

I. The operator or any human pursues a wrong goal. 

1. The required goal is not met because the operator acted wrongly* 

3, The operator fails to act in the moment of need. 
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The human errors may be divided into three levels as shown in F.gure 7.1 
£ s Z*» may be" corrected at each level of human error, shown „ 
Figure 7.2. For example future humu errors may be prevented at level L 
Amelia future incident can be avoided by correctmg the wrong ^tion 
t to human error. In the case of level 3 on, could prevent the same 
ablation from occurring again. 



7.4 TYPES OF HUMAN ERROR 

The author of reference 50 has categorized the human errors as follows: 

!. Design err*, This error results from inadequate design. For exampl^ 
the controls and displays are so far apart that an operator find, 
difficulty in using both of them effectively. 
\ Operator error. This occurs if the operating personnel fad to follow 
correct procedures, or there is, lack of correct procedures. 

3. Fabric^ error. This error occurs at the 

poor workmanship, for example, incorrect soldenng; <b) use of wrong 
material; (c) the fabrication is not according to the blueprint require- 
men l, „ 

4. Maintenance error. This type of error occurs in the field. It is normally 
due to incorrect installation or repair of the equipment. 
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hat are dUlfcrt to define either as human or related to e q ™* ent 

ThiS erTOf is aSSocia,ed ^ ^cepting out-of-toleranc 
consent or equtpmeu, or rejectmg b-tolerZ ^p^!^ 

7. Having error. The handling error occurs doe to inappropriate stora* 



7.5 CAUSES OF HUMAN ERRORS 

This section presents the main causes or human errors some of which hav 
already been discussed in Section 74 Same r,r 

f 0 ]| 0Ws: n ™ ™ e main causes are a 



1 Poor training or skill of the operating personnel In other words J 

2 ££5" rnain,enftRCe UI ******* *° 

1 ^lr rO0meilt5 ' ^ eMmplC ' aeCeSSib ^ ™<* d •«* and 
4. Poor or inadequate handling of equipment or tools 

5 ' SSkSSSS f ° f 0PeratOfS ° r 1,15 m ^^^nce personnel which 
effects their performance from being at optimum level 



7.6 HUMAN UNRELIABILITY DATA BANKS 

The material presented in this secdon is taken from reference 47 Th*m 
I* the mterested reader can consult thi, reference t f^he de^L 

Sir ,i=r ~ 2.sr.=.*: 

caSorier" Crr0f bankS ™ y * »<° ™°wmg three 

1. Experimentally based data banks. 

2. Field- based data banks, 

3. Subjectively based data banks. 



Unman Unreliability Data Bunks 
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7,6.1 Experimentally Based Data Banks 

This type of data bank is based upon laboratory sources and is gathered in 
the laboratory. The main advantage of this data is that it is the least 
influenced by the subjective elements that may produce some enor. 
Therefore, one can have more confidence in such data banks. One must, 
however, be aware that no matter hew carefully these data banks are 
developed, there is always a considerable amount or subjective element 
present. 

The well-known data bank based on the experimental findings is the 
daia store [52]. This data bank is based upon 164 selected studies. 



7.6,2 Field- Based Data Banks 

These data banks are based upon the operational data and are more 
realistic than Lhe experimentally based data banks. However, the field-based 
data banks are rather difficult to establish because these banks are based 
upon real activities occurring in the operating environment. The results 
obtained from these banks are more satisfactory than those obtained from 
the experimental sources whose tasks are often contrived. 

At present there are two noteworthy field-based data banks, which are 
described in references 93 and 78. The one presented in reference 93 is 
called the Operational Performance Recording and Evaluating Data Sys- 
tem fOPREDS) t which permits the automatic monitoring of all operator 
actions. However, it is only applicable to limited cases (e.g., switch actions). 
The other proposed data bank is called the Sand i a Human Error Rate 
Bank (SHERB) pSJ. 



7,6,3 Subjectively- Based Data Banks 
These data banks are based upon expert opinions and have two attractive 
features: 

1. They are comparatively easy to develop because a large amount of data 
can be collected from a small number of expert respondents, 

2. They are cheaper to develop. 

The subjective-based data is obtained by using less rigorous techniques 
such as DELPHI 1 13j. This technique narrows the guess-estimate varia- 
tions of the field experts by feeding back the end result of the study to 
individual judges or experts. It makes them reconsider their guess-estimates 
until some form of consensus is arrived. This method is already effective at 
the Naval Personnel Research and Development Center [36]. 
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The following requirements must be satisfied if these banks are to be 
used in the human reliability analysis: 

L Validity. A subjective data bank will contain some error. Therefore, w c 
should be prepared to accept a somewhat lower accuracy of such data 
banks as compared to the experimental data ones. 

2. Expert Jttdgement. The subjective data should be collected only from 
those personnel who are recognized as highly skilled to perform tasks in 
question and in addition, have observed others performing such tasks. 
For example, it is better to obtain data from operators rather than the 
human reliability experts. 

3. Performance Dimensions. The technique to be used should be decided 
very carefully, keeping in mind the dimensions of the performance 
being estimated. 

4. Judgment Description Level. The performance-shaping factors associated 
with these estimates must be determined at an early stage. Furthermore, 
the types of errors to be included for a particular task should be 
clarified. 

5. Procedure Specification, To obtain subjective estimates, the applicable 
procedure should be specified, for example, whether it is DELPHI or 
paired comparisons. 

The main advantage of this type of data bank is the coverage of a wide 
range of parameters for which failure data is required. 



7.7 HUMAN RELIABILITY MODELING IN CONTINUOUS TIME 
DOMAIN 

The material presented in this section is based on reference 63. Some of the 
typical examples of such tasks are scope monitoring, aircraft maneuvering, 
and missile countdown. This type of modeling is analogous to the classical 
reliability modeling. 

The generalized human performance reliability function for continuous 
time tasks is derived m the following section. (Note: for discrete case 
consult reference 62,) 

7.7.1 Human Performance Reliability Function in Continuous 
Time D omain 

Although all human tasks are not in continuous time domain, tasks such as 
vigilance, monitoring, and tracking fall in this category. In the case of 
continuous tasks, the probability of occurrence of human error in the lime 



ffuimn Reliability Modeling in Cenlitmms Time Dvmain 
interval, (St given E t ) is given by 

PiEi/E^eiOSt (7.1) 

Where e(/)=ihe human error rale at time this is analogous to th; 
hazard rate, i(t), in the classical reliability theory 
E t = an errorless performance event of duration t 
£^-an event that the human error will occur in lime interval 

(M+iSr) 

The joint probability of the errorless performance may be expressed as 
follows: 

P( E 2 /£,)/>< E^-PiEJ-Pi E 2 £,) (72) 

where E 2 denotes the event that error will not occur in interval [l,t+8t]. 
The above equation may be rewritten as 

**(' ) - **< *)H E z /*i) m R t.( ' + ) (7.3) 

where R b {l) is human reliability. Expression 7,2 represents an errorless 
performance probability over intervals [0, t] and [t, t + St]. 
By substituting (7.1) into (7.3) we get 

j { -e{t)R h {t) (7 4) 

In the limiting case, the above expression becomes 

< 7 - 5) 

To solve the differential equation we may write for known initial condi- 
tions 

/»'<<'>*- -r'^ ( '> < 7 - 61 

The solution to the differential equation (7.5) is 



This is the general expression to compute human reliability. 



(7.7) 



I6S Human Reliability 

7,7.2 Reliability Quantifiers for Time Continuous Human 
Performance Tasks 

These parameters are analogous to the classical reliability theory. Time 
continuous human performance task quantifiers are defined as follows: 

Mean Time to Hitman Initiated Failure (MTHIF). This index is analogous 
to the mean lime to failure (MTTF) in the classical reliability theory. This 
quantifier is used for the time continuous tasks such as undershooting a 
landing aircraft or overpressurizirtg a missile fuel tank. 

Mean- Time-to- First- Human- Error (MTFHR). This quantifier is analogous 
to the mean time to first failure (MTTFF) in the classical theory. The 
MTFHR may be used for cases where the occurrence of the first human 
error is highly critical. 

Mean Time Between Human Errors (MTBHE). This is known as the mean 
time between human errors. It is directly translated from the mean time 
between failure (MTBF) as known in the classical reliability theory. This 
indicator may be used where the human errors are not so critical. For 
example, it may be used for measuring the occurrence or defective parts 
due to human errors at a production line. 

7, 7. J Experimental Justification of the Time Continuous Human 
Performance Model 

To justify time continuous task model discussed earlier, the authors of 
reference 63 have developed a simple model to obtain human error data. 
The main feature of this experiment was to observe a clock- type light 
display. The operator was required to respond to a failed light event by 
pressing a hand held switch. 
The following types of data was collected from this experiment: 

L Miss error. The operator (subject) did not detect the failed light. 
2, Fake alarm error. The operator (subject) responds in such a way as if a 
failcd-light event has occurred when il did not occur in reality. 

The failure data collected from this study was analyzed by graphical 
technique and the Kolmogorov-Smimov d statistic. 

This study reported that the human error rale is a time variant. Further- 
more, this experiment tested the following types of errors: 

1. Times to first miss error. 

2. Times io false alarm error. 

3. Combined miss and false alarm error. 
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The Weibull. gamma, and log-normal density functions emerged as the 
representative distributions For the goodness of fit. 

7. 7.4 Human Performance Effectiveness Function (Correctability) at 
Time Continuous Domain 

The correctability Function C h (t) concerns with the correction of the 
self -generated human errors. In reference 63, it is defined as the probabil- 
ity that a task error will be corrected in time / subject to stress constraint 
inherent in the nature of the task and its environment. In other words, the 
correctability function may be defined as 

C h { t ) = P { correction of e rror in time t /stress } ( 7 .8 ) 

The time derivative of nol-correctabihty function C h (t) may be defined 

as 

Q(f)— li^O (7.9) 

where the prime denotes differentiation with respect lo time t. N is the 
total number of limes task correction accomplished after time /. N^f) is 
the number of times task not completed after lime t. 
Equation 7.9 may be rewritten in the following form: 

^{A^{0}~ l c;(/)=A r c (/)[^(0}" 1 (7.10) 

The right-hand side of (7.10) represents instantaneous task correction 
rate C K {t ). Hence, (7.10) may be rewritten as 

{^(Or'^O + C^O-O (7.11) 

By solving [he above differential equation for given initial conditions we 
gel 

(7.12) 

since 

Therefore, 



(7.13) 
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The above equation is a genera] expression. It holds for both constant 
and instantaneous correction rates. The experimental results with data, for 
the above function are presented in reference 63. This experiment dealt 
with the operation of a standard £-lype manual control stick grip, subject 
to two degrees of freedom representing the pitch and roll motions of an 
aircraft in response to the instrument altitude pointer movement. 

These results indicate that for both vigilance and compensatory tracking 
tasks, the Weibull density function is a suitable Fit for the time to first error 
correction. On the other hand, the log- normal is equally applicable for the 
time for correction of errors. 

7,8 HUMAN ERROR PREDICTION TECHNIQUE 

This technique is relatively well known among the human reliability 
experts. It is known as THERP (technique for human error rale prediction). 
THERP, which is discussed in detail in reference 79, is based upon the 
classical analysis method. The basic steps associated with THERP are 

1. List main system failure events, 

2. List and analyze human related functions. 

3. Obtain estimates for the human error rates, 

4. Determine human error effects on the system failure events in question. 

5. Make necessary recommendations :tnd necessary changes in the system 
in question. Ai the end compute new failure rates for the system under 
study. 



7.8.1 Probability Tree Analysis 

This is one of the main techniques for human reliability analysis. Success 
or failure of each critical human action or associated event is assigned a 
conditional probability. The outcome of each event is represented by the 
branching limbs of the probability tree. The total probahility of success for 
a particular operation is obtained by summing up the associated probabili- 
ties with the end point of the success path through the probability tree 
diagram. This technique, with some refinement, can include factors such as 
time stress, emotional stress, interaction stress, interaction effects, and 
equipment failures. 

Some of the advantages of this technique are as follows; 

1. 1 1 serves as a visibility tool, 

2. The mathematical computations are simplified, which in turn decrease 
(he probability of occurrence of errors due to computation, 

3. The human reliability analyst can estimate conditional probability read- 
ily, which may otherwise be obtained from the complicated probability 
equations. 
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A hypothetical task probability itw 



Example Assume that an operator performs two tasks, say x and y (the 
task v is performed before/). In addition assume that tasks x and y can be 
performed either correctly or incorrectly. In other words the incorrectly 
performed tasks are the only errors that can occur in this situation. Draw 
the probability tree for this example and obtain the overall system proba- 
bility to perform incorrect task. In this example we assume that the 
probabilities are statistically independent. 

This example states thai the operator can perform task x correctly or 
incorrectly. Later, the operator may proceed lo perform task/ which also 
has two different possibilities (correct and incorrect). The following nota- 
tions were used to define the probability tree diagram as shown in Figure 
7.3: 

/^-probability of task success 

P { = probability of failure to accomplish required task 

$= success 

/= failure 

P m probability or success in performing task jc 
p v m probability of success in performing task y 
f. = probability of failure to perform task x 
improbability of failure lo perform task y 



The probability of success. P„ can be written from Figure 7.3 as follows: 
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Similarly, the failure probability, p can be written direcilv from Figure 
7,3 as follows: 

p,= r,P y - +Pi P y+ P gP _ (7l5) 

mt -P**> (7.16) 

It can be noticed from Figure 7.3 that the only way the system task can 
be performed successfully is that both the tasks x and y must be done 
correctly. Therefore the probability of performing system task correctly is 
simply given by P x i\. This technique is described in more detail in 
reference 79, 



73 HUMAN RELIABILITY ANALYSIS APPI JED It J NUCLEAR 
PLANTS 

There is no single technique that can readily be applicable to the nuclear 
power plants. The technique such as THERP may be applied to predict 
human reliability. However, the following performance-shaping factors [77] 
are to be considered in the human reliabihly analysis when applied to the 
nuclear power plant. 

1. Training and practice quality. 

2. Quality and existence of written instructions as well as the method of 
proper usage 

3. Quality of human engineering as applied to the nuclear power plant 
controls and displays, 

4. Type of the display feedback. For example, there may be too many 
displays competing for the operator attention. 

5. Human action independence, 

6. Redundancy concerning humans. 

7. Psychological stress. 

Once these shaping factors have been considered, one should proceed to 
estimate the human error rate. Human error rate estimates then should be 
included in the Fault Tree Analysis, This type of analysis is probed in 
depth m reference 61. 
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Three-State Device Systems 



BJ INTRODUCTION 

A three-slate device operates satisfactorily in its normal mode but can fail 
jn either of the two other modes. Typical examples of such a device are a 
fluid flow valve and an electronic diode. Closed (shorted) and open failure 
modes pertain to such devices. 

Redundancy can generally be used to increase the reliability of a system 
without any change in the reliability of the individual devices that form the 
system. However, in the case of a system containing three-slate devices, 
redundancy may either increase or decrease the system reliability. This 
depends upon the dominant mode of component failure, configuration of 
the system and the number of redundant components. 

An electronic diode and a fluid flow valve are typical examples of 
three-state devices. Either of these components may fail catastrophically in 
either the open or closed (shorted) mode. A given three-stale device will 
then have a probability of failure in the open-mode and a probability of 
failure in the closed or shorted mode. Because a three-state device cannot 
fail simultaneously in both the open and closed (shorted) modes, the 
failures are mutually exclusive events. The failure of any one such device is 
considered independent of all the others. 

Three-state devices can be arranged in various redundant configurations 
Such as scries, parallel, series-parallel parallel -series, and mixed arrange- 
ments. As these configurations become more complex, the analysis of 
networks becomes more cumbersome, and redundancy can result in 
decreased overall system reliability. This lower system reliability is due to 
the redundancy of the dominant adverse mode of failure. 



8*2 LITERATURE REVIEW 

Careful consideration of the reliability of three-slate devices was presented 
by Moore and Shannon [27] and Crevehng |7] in their 1956 papers on 
electrical and electronic devices. Creveling developed the reliability and 
failure equations for a diode quad arrangement, whereas Moore and 
Shannon developed formulas for several relay networks. 
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The year 195? brought another development when Lipp [251 discussed 
the topology of switching elements versus reliability. The: following Vs!ar 
Pnce [29] specifically dealt with the reliability of three-state devices i n ' 
parallel configuration and attempted to optimize the number of redundJ! 
components. In I960, Barlow and Hunter [1-3] used calculus to optimize 
the reliability of series, parallel, series-parallel, and parallel-series networks 
They also computed the number of components that maximize the ex 
pected system life for these first two types of systems assuming componcm 
life is exponentially distributed. 

In 1962 Sorensen [35] applied the theory established by the previous 
researchers on three-slate device networks to several electronic circuits. His 
primary approach was veiy similar to that of Crevehng. In the same year 
Cluley [6] published a paper on low-level redundancy as a means of 
improving [he reliability of digital computers. Also in 1962 James et al. [23] 
reviewed the reliability problem and derived some systems reliability 
equations for redundant three-state device structures. In 1963, Blake [4] 
extended the work of Moore and Shannon [27] on networks of relay 
contacts by investigating the open and short circuit failures of hammock 
networks. Barlow et al. [3] extended their previous contribution to maxi- 
mize the expected system life for components having exponential and 
uniform time to failure distributions. 

In 1967 Kolesar [24] extended the work of the previous researchers when 
he optimized a series-parallel three-state device structure under con- 
strained conditions, in 1970 VI is™ and Rao [26| developed a signal flow 
graph approach, During the following 2 years, only one of the four studies 
making reference to the subject appears to be important, Evans [19] gave a 
very brief introduction to three-state device reliabilities in his paper and 
Butler [5] made brief reference to it in his publication. 

Since 1975 several contributions on the subject have been made by 
Dhi lion [3-17, 30-34]. 



8-3 RELIABILITY ANALYSIS OF THR HE-STATE DEVICE 
NETWORKS 

The system reliability equations are developed Tor several configurations in 
this section. More detailed derivations are described in Appendix. 

8.3.1 Series Structure 

In a series configuration any one component failing in an open mode 
causes system failure, whereas all elements of the system must malfunction 
in a shorted mode for the system to fail. The system reliability is given by 

11(1-0- II '?„ (8.1) 
-'-1 i-i 
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where tf^the series system reliability 

n = the number of nonidcntical independent three-state compo- 
nents 

f OJ --lhe probability of open -mode failure of component J 
q^ = lhe probability of short-mode failure of component i 

In the case of component constant open and short mode failure rates, 
the open and short mode failure probability equations become [8] 

and 



where A JP =thc open-mode constant failure rate 
Xj-lhe short-mode constant failure rate 
t = lime 

The derivation of (8.2) and (8.3) are shown in Section 8.5.2. To obtain 
(8.2) and (8.3) set ji, - fi 2 =0 in (8-57) and (8.58). respectively. By substitut- 
ing expressions (8.2) and (8,3) in (8.1) we get: 



*(0= n 
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(8,4) 



Short Failure Mode Probability. The system short or closed failure mode 
probability, Q t , is given by 



<?.- n 



(8.5) 



Open Failure Mode Probability. Probability or open mode failure for a 
series system is given by 



(66) 



(-1 



Where Q a is the probability of open mode failure of series network. 
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Figure 8.1 An identical czimpanrnt serisi structure umeliabiliiy plot, 



Plots of (8.5) and (8.6) are shown in Figure 8.1. This figure shows that, 
the open mode failure probability increases as the number of redundant 
components in the series system increases. 

Example 1. Consider two independent identical diodes connected in 
series. Open and short circuit failure probabilities are 0.2 and OA, respec- 
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lively. It is required to find the system reliability of the two diodes for this 
simple arrangement. 

In this case w=2, ?,=0.1 and -0,2. Rewrite (8.1) for two identical 
diodes 

For given data, 

K J =(1-0.2) 1 -{0.1) 2 - 0( > 3 

8.3.2 ParaUet Structure 

For a parallel configuration, all the elements must fail in Lhe open-mode or 
any one of Lhe elements must be in a short-mode to cause the system to 
fail. The parallel network reliability is given by 

where m is the number of nonidentical independent elements. 

The open and short failure mode probability plots are the same as 
shown in Figure 81. Because of duality, the short failure mode probability 
replaces the open failure probability and vice versa. The same duality 
concept applies to (6.1) and (8,8), 

Example 2. Suppose the data of Example I is used for parallel configura- 
tion; evaluate the system reliability by using (8.8) 

R - ( I - q, f ~ ql = ( 1 - 0- 1> J - mf =0-77 
The parallel system reliability is 0.77, 

8.3.3 Series-Parallel Network 

This is a combination of series and parallel configurations. System reliabil- 
ity is given by (8.9) for n identical independent units, each containing m 
independent elements: 

r tit \ii f m \1 

Example J, Consider the reliability evaluation of series-parallel arrays of 
the identical fluid flow valves with q a =0.2, q, =0.1, *=2 and m=4. 
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For n = 2 and mi = 4 (8.9) becomes 



^-(i-^) J -{i-(i- 9l )Y 

For ^ -0.1, ? fl =0.2, the system reliability 

* = (l_0.2' r 1 -(j-(]-0J) J, } J = 0.88 



(8.10) 



AJ,^ Parallel-Series Structure 

This configuration is a dual or the series-paralJel network. The system 
reliability equation for a configuration containing m identical units and n 
number of nonidcntical series elements becomes 



*-(•-,->)"- 



(8.11) 



Example 4. Use the date given in Example 3 and evaluate the parallel- 
series network reliability. Therefore 



2V< 



»(1-0.1 2 ) 4 -{I-(|-0.2) J } 
-0.9438 



(8.12) 



8.3.5 Bridge Network 

This configuration is shown in Figure 8.2. The following bridge reliability 
equation, R b is taken from reference 25: 



02 



(8.13) 



where Q ak is the network open failure mode probability, for it - I 

Q ek is the network short (close) failure mode probability, for it = 2 



and 
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J-'ipirt K.J A bridge network at dissimilar comrxmerj Es. 

for 

As shown in Figure 8.2, the bridge network is composed of five ele- 
ments, i=l, 2,. ...5, where the element number 3 is known as the critical 
element, 

8.4 DELTA-STAR TRANSFORMATION TECHNIQUE 

The reliability evaluation of series, parallel, and series-parallel networks is 
widely discussed. To evaluate the reliability of a bridge, or other such 
complex structures, the theories in the literature are difficult to apply. The 
delta-star transformation [8] is a simple approach for such problems. This 
technique transforms a complex structure to a series and parallel form. 
Thereon the network reduction technique may be applied to obtain relia- 
bility of transformed configuration. The technique introduces a small 
error, which can be neglected for practical purposes. 

Transformations are earned out in terms of both of the failure modes 
instead of simply reliability or unreliability as is the case for a two-state 
device structure. 

The resulting delta-star transformation formulas are developed by find- 
ing the leg equivalent, as illustrated by Figure 8.3. 



8.4.1 Opart-Failure Mode 

The delta-star leg equivalents are obtained in the same manner as the 
simpler two-state component case. Figure 8.4 illustrates the leg equivalents 
for the open -mode failure case. 



Figure SJ A ttrLlz-slar equivalent lor the open-mode failure. 
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Figure 8,4 Ddta-slar eqyivalcnl legj. 
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Again, by using the independent probability laws For the scries and 
oarallel structures, the equivalent legs of the block diagrams as shown in 
Figure 8,4a, b, and c result in (U7). (3.18), and (8,19), respectively: 

Series Case. Series system open-mode unreliability 

i- 1 

Where q gi is the component*' open-mode unreliability, r- 1, n. 
Parattel Structure Case. Open-mode system unreliability 

a-fu, < 8,6 > 

(-1 

With the aid of (815) and (8.16) the equivalent legs of the block diagrams 
are transformed, respectively, to the following: 

i-p-^X'-O^Mi-OO-O]*^ (817) 
i-<i-«J0-O-[Mi-O0-Ol«*. < 8l8 > 



From these simultaneous equations result the following delta-star conver- 
sion equations; 



I*, = 1 - 



[l-{1-(l-Q(1-OKJrHMl-O0-OK.] 



[i-(i-(i-^)(i-0)^] 



(3-20) 



(8.21) 
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Figure 8.5 A short-failure delia-star tfmnsformntkm. 



8.4.2 Short- Failure Mode 

Similarly, as for the open -failure mode, Figures 8,5 and 8.6 show the 
short-failure mode equivalent configurations. 

Again, with aid of the independent probability laws for parallel and 
series structures, (8.25) -(8.27) are obtained from their equivalent corre- 
sponding legs of the block diagrams of Figure 8.6fr-c. 

Series Case. System short-mode unreliability 

Q s - IT q sl (5,23) 
t-1 

where a si is the components' short-mode unreliability, /= 1, n. 
Parallel Structure Case. 

n 

Q s =\- |I (] -< hi ) (g.24) 
f~ I 

With applications of (8.21) and (8.24) to the equivalent legs of the block 
diagrams of Figure 8,6d-c the corresponding equations become 

W, c - ' -(I "fc„O0 J (8*25) 
= 1 - ( 1 ~ ^,cO( 1 (8^6> 
M**" 1 -(t-IURtoXl-O (8.27) 





U 

Ifl 

Figure Sj6 Delta-star equivalent legs. 
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Solving these equations simultaneously yields 
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[ ' - ( ' - gagg )( ■ - 3a ) ] [ 1 - ( i - gy^ )0-g..j) 



[■-f^^^X»-0]ri-(i-f^.jfi-g._)i 



(8,28) 
(8.29) 

1/2 

(8J0) 



II is readily seen that (8.28), (8.29), and (8 30) are all interrelated After 
computing the unreliability value by use of the first equation, the computa- 
tion for the other two is made easier because the first computation is used 
in their evaluation. 

"TJ^- ° f areuraent aPP*"® <° 8* open-failure equations (8.20), 
(8.2 J), and (8.22). v " 



Example J. A bndge network example is solved here (o UJustrate the 
of these formulas. As an example, the network shown by Figure 8 7 is 
where the delta configuration is identified with the labels A B and C 



use 
one 




f*~^f # *<M " 7, ■ 0?, j a 'D.I 
Hgun: B.7 A ihnee-stale bridge strutting 



Delta-Star Transformation Technique 

Its equivalent open and short failure mode probability values for this 
situation are obtained by using (8.20)-(8,22) and (8.28)-{S-30), respec- 
tively. The numerical results obtained are as follows: 

Open-mode failure probability: 

^-0.01 ^,-0.01 v= 001 



and 



h =0.482 ^-0.482 ^-0.482 



These relationships allow Figure 8.7 to be redrawn as its equivalent as 
shown by Figure 8.8. The resulting total open and short mode probabilities 
of failure for Figure 8.8 are 



e.=i-[«-{(i-^.>(i-^J}{ 1 -< 1 -^K , -^ 1 >)][ 1 -^] 



and 



ft - [ I - ( 1 1 - J ] 1, 



By using (8.31) and (8.32) 

ft- 0.022 0,-0.088 



(8.31) 



(8.32) 



a -0 4B2 



a - 0.01 
■ 4i 

a - 0.482 



0.4S? 



4>i * 0.1 



Figure BJ8 A [ransJormed bridge strutlure. 
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thereby giving bridge reliability 



8.5 REPAIRABLE THREE-STATE DEVICE SYSTEMS 

This section presents several mathematical models of repairable systems 
Most of these models arc available in the referenced hWure 



8JJ Analysis of a Threat* System mth 7W Types of Components 

m*^^t Rl T* b> :f ng variables technique 

. and If anv one component of Class I fails, the system wil 
CXpCnence a COm P^ ^<em failure. A component failure of cCfl w 
cause a catastrophic system failure. Some Typical exarnp L o7 such I 
system are automat, machines, fluid now valm. fl ^3 jlCS^ 
system that jams so that rotation is blocked, a shaft that shearsTo S an 

System states are defined as follows: 
1. Norma/ stare. The successful functioning of a device 

operate at all), normally caused by the failure of a Class 1 component 
SKftW* - ^ ^ System faAjure st ^ te ™ which a system or 
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Suppose an automatic machine carries out some operations on assembly 
line items. The automatic machine is composed of many component parts; 
therefore, the components of the machine can be divided into two classes 
(i.e.. Class 1 and 11). A component failure of Class I causes the complete 
failure or breakdown of the ^automatic machine. A failure of any one 
component of Class 11 will cause a catastrophic failure of the automatic 
system (this type of failure will initiate some unwanted operations on the 
assembly line items). 

Obviously, to restore the automatic machine back to its normal state, 
repair is necessary. Repair times are arbitrarily distributed. 

The following notations and definitions are used to formulate this 
Markov Model: 

P 0 {t) = the probability of the system being in its normal mode at 
lime t. 

t)A- the probability that at time /, the system which has failed, 
because of the failure of its ith component in Class I t is 
being repaired and the elapsed repair lime lies in the interval 
{y, y + Af ) for /= 1.2,3, n. 
P 2 t (x,t )A« the probability that at lime / the system that has faded, 
because of the failure of its rth component in Class II, is 
being repaired and the elapsed repair time lies in the interval 
(x.jr + A/) for 1 = 1,2,3, n. 
i} t (y)& = lhc first- order probability, thai the ith component of Class I 
is repaired in the interval (y, y + A), conditioned that it was 
not repaired up to time y. 
f* ( (.r)A=lhe first-order probability, that the ith component of Class 
II is repaired in the interval (.v. x+A), conditioned that it 
was not repaired up to lime x. 
A r = ihe constant failure rate of the ith component of Class II. 
y t - the constant failure rate of the ith component of Class 1. 
j = lhe Laplace transform variable. 

Assumptions 

1. Failures are statistically independent. 

2. A failed system is restored as good as new. 

A Mathematical Model. The integro- differential equations (and associated 
boundary-initial conditions) associated with Figure 8.9 are 



+ S f%ife»')$(*)* (8-33) 
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3 ft 

37+^+^) 



3 * 9 , ^ 



(834) 

(8,35) 



^cW™ It at 1=0 other initial condition probabilities are zero, where 

r- I f- 1 



Solving the above inlegro-differcnlial equations by Laplace transforms and 
integration (including some substitutions) will yield 



I 



■ - 1 r- 1 



(8.36) 



where 



Since 



<? : » = jf V'>,(j >exp|| - jf * Mf < *) <fc J dx 



(8 37) 
(8,38) 



where P, s (s), /^(j) are the Laplace transform of probabilities P t ,(f). 
p i,t{t) ^at lne system is under repair due to the failure of the ith 
component in Classes I and II, respectively. Therefore. 



(8.39) 



for /-1,2; /- 1,2,3, /jjAj-y,;*^^ 



The Laplace transforms of probabilities /"](/) and /^(r), that system is 
under repair due to the failure of any one of Classes I and II components, 
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respectively, are 
Substituting (8.39) into (8.40) yields 




for /«t*2; Aj—X, (*- 4 U 

Therefore, lor given repair probability density functions G y ,(r), the state 
probabilities P^t), jP/O can be obtained by simply taking the inverse 
Laplace transform of (8.36) and (8.41), respectively. 

The steady-stale solution, if it exists, of (8.36) and (8.41) can be obtained 
by employing Abel's Theorem to Laplace transform, 

limj/(s)= lim/(r). (8.42) 

Mean time to system failure (MTSF) (if exists) can be obtained from 

MTSF= Sim P 0 (x ) (8.43) 

More detailed analysis of similar models using the method of supplemen- 
tary variables are presented in references 39-41. 

8.5.2 A Repairable Three-State Device with Constant Failure 
and Repair Rates 

This model [30] is a special case of the model presented in Section 8.5.1. 
The system transition diagram is shown in Figure 8.10. 

Assumptions 

1, Failures are statistically independent. 

2. The repaired system is as good as new. 
3* Repair and failure rates are constant. 

Notation 

J*,(0 = the probability of the state in question, at time /; i=0J,2 
X = the constant failure rate in question 
fi-lhe constant repair rate in question 
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(8.40) 
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r 9 m 

(rtoimal ttate) 




P, If) 

{open mode failure prohnbilKv 
stale I 



(short mod* failure state) 



rijfurc 8.10 A repairable Markov model. 



From Figure 8.10. the resulting differentia! equations are: 



/tf0)-l ^,(0) = />,((>) -0 
The Lapiace iransform of (8,44)- (8.46) yields 

-A,/' 0 (.0 + (j+fi,)^(j) + 0/' i (i)-0 



(8 .44) 

(8.45) 
(8.46) 



(S.47) 
(8.48) 
(8.49) 



The coefficient of the above simultaneous equations can be written as 
follows: 







-r* 2 


1 


— A| 




0 


0 


-A 2 


0 


u+flj) 


0 
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.\The solution by Cramer's rule yields: 

(8,50) 

Ph)= . M* + Ei> T (8.51) 

1 j[j 2 + s((i, + (ii + X l +\ J )+u l ^j + A lr t 2 +A J /i l J 

. M J+ fti) 52) 

jU J [jr 2 +j(fi l + ^ 2 +A l +A I ) + (fi,M2 + A^ I +,\ lrl| )] 

The roots of the denominators of (8.50)- (8.52) become 



1 ■ ">! 



-(Mi + Mi + A l + A 1 )±y f (^,-t-M; + A 1 + \ I ) 2 -4(fi,/i 2 +A,Mi+A I M J ) 

2 

Now, (8.50)-(8.52) can be expanded in a partial fraction form 



Ms)- 



i(J-fc,)(J-*: 2 ) 



— — - ~h 



(8.53) 

j(j-*,)(j-A: 2 ) 

a,m 2 (A^,+A,m 2 ) ] (m 2 + A 2 )A, 1 



(8.54) 



(855) 



HU 
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In time domain, (8.53) and (8.55) become 



k \ k 2 \ *|t^|-fr 2 ) 



J I J 

(8.56) 

|( ' Mi I *,(*.-*,) J | *i(*i _ *a) J <8 ' 57) 
Since 

M^MiMi+^fij+Ajfi, 
*i + *2 = -(fi + ^ + A, + A 2 ) 
therefore, the addition or (8.56)- (8 .58) will yield unity, that is, 

n(0+^(0+/' 1 (f)=i 

The equipment availability is 

Availabilit y -/. D (,)= Mi + ( (*.^.K*.^»> U 

Ml I A,(*|-* a ) J 

The availability expression is valid if and only if k t and t 2 are negative. As 
r becomes very large, the steady-state availability equation can be ex- 
pressed as 

^^(0— (8.59) 



8.5.3 A Mixed Markov Model with Two Three-State Devices 
(Master-Stave Relationship) 

This mixed Markov model [34] has the two units modeled in series. One 
device has normal, parljpl, and catastrophic states and the other has 
normal, open, and closed mode slates (Type II). Repairs are performed 
only when an equipment fails in its partial mode. 




A typical example of such a system is a fluid flow valve commanded 
from an instrumentation control panel where the control panel represents 
the first type of device (Master) and the fluid flow valve represents the 
second type (Slave). Such practical examples are numerous and may often 
be encountered in a modem electrical power station. The transition dia- 
gram for this case is shown in Figure 8.11. 



Abbreviations and Notations 



/>(?)- probability of the state in question, at time i 
)() 

Abnormal mode stale of the three-state devices (i.e., master and 

slave), respectively, t m 1,2. 
C, =e;m<iMmphic failure stale of the "master" three-slate device 
enclosed mode failure state of the "slave" three-state device 
f, -partial failure stale of the ''master" three-stale device 
U 2 = open mode failure state of the "slave" three-stale device 
\\ = constant partial failure rate of the ''master" three-stale device 
\j = constant catastrophic failure rate of the "master" three-stale de- 
vice 

\ s = constant failure rale from partial to catastrophic failure slate of 

the "master" three-state device 
A 3 = constant close mode failure rate of the slave three-state device 
A 4 = constant open mode failure rate of the slave ihree-state device 
ji t = constanl repair rate of the master device 

f - time 
Af = time interval 



' Three-Staie Device Systems 

Assumptions 

1. Failures are statistically independent, 

2. The repaired system is as good as new. 

3. Failure and repair rates are constant. 

The state differential equations resulting from Figure 8.11 are 

s + (A 3 + A 4 +^ i +,i,)^ j (,)=A,^,(f) (8,61) 

Solving the above differential Equations by Laplace transform yields 

Aj + A 4 +A, + /i,+J(r, . , 

where 

-Af± V ft 2 -4AM 

and 

A-\ 

JV=A, +A 2 + 2A 3 +2A 4 +A 5 + U) 

M-A,A 3 +AjAj +\\ + 2A 3 A 4 + \,A 4 +A 2 A 4 + A 2 4 + \,A 5 +A 2 Aj 
+ A 3 a s + A 4 A 5 +A 2 jtt[ +A J ji|+A 4 ^ 1 

Therefore. 

System reliability ■ / ) + /\,,* 3 (/ ) ( 8 .64) 
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8,5.4 A Repairable Markov Model of Two Units in Series I 
Consider two three-state devices arranged in a series configuration [341. 
The repair is performed only when one of the devices fails in Us closed 
mode assuming the other one is still operating. Two fluid flow valves 
operating in sencs represent a good example. The transition diagram is 
shown in Figure 8.12. 

Abbreviations and Notations 

/>< i ) = prcbabilit} of Mate m question, a1 time r 
(-)(■) 

N. — normal mode state or the both three-state devices, t = l t 2 
C|' = close mode failure state of the first three-state device 
Cj= close mode failure slate of the second three-state device 
m« constant repair rate of the first three-slate device 
fi 2 ^= constant repair rate of the second ihree-siatc device 
X* = constant close mode failure rale of the first three-state device 
\ 2 = constant close mode failure rate of the second three-stale 
■ device 

A, = constant open mode failure rale of the first three-state device 
A 4 = constant open mode failure rale of ihe second three-stale device 

;= time 
if = time interval 

s~ Laplace Transform variable 

Assumptions 

1. Failures are statistically independent, 

2. Repaired device is good as new. 

3. Failure and repair rates are constant. 

Mi 




Flpire 8.12 A series two-unil repairable M«rkov model. 
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Three-State Device Systemt 
The differential Equations associated with Figure 8. 12 are 

dP N v(') 

~ HXi+K + ^ + KW^O-Pc.NjiO^+P^cSO^ (8-65) 

dP c ,4') 



dt 



+{\ 2 +x J+Ml )/W0-iW,) Al 



r,, N p)=o /' V|tj (0)=o 



(8.66) 
(8.6 



The values of /^/j), f„ ]Ci (j), /t,^*) are obtained from the abo 
differentia] equations: 



(8. 68) 



where A ! 



(j+A^Ai+Aj + AJ - Ml 

-A, (f+Aj+Aj+^,) 



-Mi 
0 



"An 



A,fj+A, +A 4 + fi :i ) 



-A 2 (j+A z +A i +^i l ) 



(j+A.+Aj+Aj) 
(8.69) 

(8.70) 



The steady-state solutions (if they exist) of (8.68)-(8.70) can be obtained 
by employing Abel's Theorem to Laplace Transform, that is, 



limj/(j)= iim/(0 

■ -II t-t-m 



{8,71) 



8.5.5 A Repairable Markov Made! of Two-Vwt in Series U 
(Partial and Catastrophic Failure Modes) 

Consider two three state devices arranged in series [34]. The ihree-state 
device is repaired only when it fails in a partial mode {i.e., the other 
three-state device is operating successfully) or both devices are operating in 
their partial failure mode. Two automatic machines performing some 
operations on the assembly line items represent a typical example. The 
transition diagram for this series configuration is shown in Figure 8.13. 




Figure 8,13 A scries iwo-unii repairable Markov model. 



Assumptions 

1. Failures are statistically independent, 

2. Repaired system is as good as new. 

3. Failure and repair rates are constant. 



Abbreviations and Notations 
P(i)~ probability of state in question at time t. 

foto 

Abnormal state of both three-stale devices, <- 1,2 

C s = catastrophic failure state of the first three-slate device 

catastrophic failure state of the second three-state device 
P } = partial failure stale of the first three-slate device 
Fj= partial failure state of the second ihree-slatc device 
X,= constant partial failure rales of both the devices, respectively. 

/- 1.2 

/i, = constant catastrophic failure rates of both the devices, respec- 
tively, i" 1,2 
y^constant system repair rates, 1=1,2,3 

A 5 -constant failure rate from partial to catastrophic failure mode of 

the first unil or device 
k 4 = constant failure rate from partial to catastrophic failure mode of 
the second unit 
f—lime 
Ai = lime interval 
s ■* Laplace Transform variable 
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The differential equations associated with Figure S. 13 are 

dt +A > + ^ +^) / '^ 1 (')=ri/ , V) (')+7 1 / , ^ i (/)+T J /' Vi 

(8.72) 

—ft— +lK+^+y>+\<)P tttFl (t)-P lltM jlt)\ 1 (8.73) 

d,' • +Mj +7: (8.74) 
dP^ Fi (t) 

# + X * + V^ F r l 4 i )- F *,4')* 1 +r MiP p)\ l (8.75) 

= 1, at j"— 0 other initial condition probabilities are zero. 
The values of J^/jt), J^j), ,,,(*) arc obtained from the 

above differential equations: 



P m _ -^■ + *i + y3+^Xj + ft 2 + A i + Y2+A 3 )(*+A i -f A 4 + y ,) 
* 1 ■ 4 

(8.76) 



A- 



<*+A|+Aj+jl,+fij) -T t yj 

/> r -\_ ~ { A iM*+f j +yj +A 3 )) +( Mt +A, + y ,-t-^)A,A, 
W £ - • 

(8.77) 

r *i*,W ~ — - (8 .79) 



&.S.6 A Tnw-f'ailuiv-Mode System with C&td Stand-By Units 

Mathematical model [16] presents a system with two failure mode units 
and N stand-by units. The operational unit can be repaired at oae of its 
failure modes, This may be regarded as a minor failure mode, in a case 
where the on-line failures can be repaired at the place of equipment 
installation or die unit repair lime is less than the unit replacement time. 
When the unit repair is costly and time consuming, the failed unit is 
replaced with one of the stand-by units. Some of the typical examples of 
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such a system may be the production line machinery, transformers, motors, 
heavy duty electrical switches, and so on. 

Assumption* 

| K j s assumed the repaired or replaced unit is as good as new. 

2. The unit repair rate is faster than the unit replacement rate. 

3. System fails only when the last standby unit fads. 

4. The unii failed in its catastrophic mode is never repaired. 

5. Failures arc statistically independent. 

6. A unit has two failure modes. Units can not fail in their standby mode 

Mathematical Model. The transition diagram of this system is shown in 
Figure 8-14. The following definitions and notations are used to formulate 
ihis mathematical model: 

yv = number of identical standby units 
rt = last state number of the sysiem 

^ = constant replacement and repair rates, respccdvely, of the oper- 
ational unit for i™ 1,2 and >n t 
X, -constant unit replacement mode failure rate 
\ 2 *> constant noncataslrophic mode (repairable mode) failure rate 
1 = lime 

5 = Laplace transform variable 
P 0 (0-unit operational mode probability at lime / 
/*,(?)=* unit repairable mode probability at lime ! 
p k ( (,)=unit failure, system operational and, system repairable mode 
probabilities at lime for i-2, 1,0 respectively, and k- 

4,7, 10,. ..,(fl-l) 
P n {t) = sysiem failure mode probability at time t 

The system differential equations for the Figure 8.14 model are 



Kit)- -(X, +a 1 )/MO+j\(i)m2 < 8 80) 

^_,(0" -<*,+*!) Vi(<)+^(0f* 3 + Vi<'>fi, (8-83) 

/tt')=-"iA(0+/W)X| (8.84) 

for k-4 T 7 T ]0 ( ...,(fl- 1) 




Al / , 0 {0 , )= I, other initial condition probabilities are equal to zero. 

n-3(rV+])-| for /V>| (8.86) 

where (he prime denotes differentiation with respect to time /, The Laplace 
transforms of the solution are 

«d-£f#k (892) 



A J, ? Availability Analysis of a Two-Faititre Modes System with 
Nonrepairabfe Stand-by Units 

This model considers a system containing N identical units of which one is 
functioning and (A' - t) are standbys. As soon as the operational unit fails 
in any one of the two failure modes, it is replaced by one of the (AT- J) 
standby units. The system functions until the last standby unit is opera- 
tional. The transition diagram of the Markov model is shown in Figure 
8.15. 




Notation 

X, = system operational (i.e., for i -0,3, 6, 9. ..(ft- 2), failure mode type 

f (i.e., for i- 1,4,7, 10, ,.„<ii- 1) and failure mode type II (i.e., for 

/ = 2. 5,S. . n> states 

^(0 = system operational (i.e., for / = 0,3,6,9,.,.,(n-2). failure mode 

type I (i.e., for L4,7.10,...,(fl-1). and failure mode type II 

(i.e., for /= 2,5,8,..., n) probabilities at time l 
A, -constant type 1 and type II failure mode failure rates of the 

operational unit, respectively (i.e., for f= 1,2) 
fi.= constant type I and type II failure mode slate replacement rates of 

the failed unit, respectively (i.e., for i- 1,2) 

number of system states 
N= number of units in the system (i.e., the operational unit plus 

standby units) 
r = time 

j= Laplace transform variable 
Assumptions 

1 . Failures are statistically independent 

2. Restored unit is as good as new. 

3. Cold standby units cannot fail, 

4. Failed unit is never repaired. 
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The system of differential equations associated with Figure 8.15 are 



i- 1 



for A!- 3, 6, 9, 12 {n~2) 

for Ar=3,6.9,l2 („-2) 



(8.93) 
(8.94) 

(8.95) 
(8.96) 



*HO-P t jc-z£tfr t + P iK - U U)N - 2 KP K (0 

i-i 

for #=3,6,9,12, -• - (n-2) 

At F o(Q) =■ °lher initial condition probabilities are zero, 

w = (3Af-|) forW>l 

where ihe prime denotes the derivative with respect to time i. Solutions to 
the above system of differential equations in the ,t domain are 




(8.99) 



J 



(8.100) 



s+ 2 \ 



for K-3,6.9,12 («-2) 



(8.101) 



^( , )-{/(A-- I) (j)*i 1 +/ , (ff _ l) (OM I }- 



(8.102) 
(8.103) 
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for A>3,6.9,12 («-2) 



(8.104) 



(8.105) 



To obtain state probabilities inverl (8.I0O)-(8.1Q5) to time domain [i.e., 
take inverse Laplace transforms of (8.I00)-(8.105)]. The system opera- 
tional availability, A t , can be obtained from 

n-2 

^=2 P,(0 (8-106) 
for /=0.3,6.9,...,{n-2) 



J, 5. J /< k-our-o/-n Three-State Device System with 
Common-Cause Failures 

This section presents a generalized Markov model to represent repairable 
A.-oul-of-fl units system with common-cause failures [14]. This mathemati- 
cal model can also be applied to represent repairable series or parallel 
(two- or three-slate device) network subject to common-cause failures. 
Some of the common-cause failures may occur due to (a) undetected 
design errors; (b) operator and maintenance errors; (c) common environ- 
ments; (d) common manufacturer; (e) common energy source; (f) same 
repairman; or (g) equipment failure event — fire, flood, tornado, earth- 
quake. A typical example may be a redundant configuration composed of 
two motorized fluid flow valves with common (control circuit) power 
supply. This type of situation is frequently encountered in power stations. 

Assumptions 

1. Three-state devices are identical. 

2. The redundant system is only repaired when all devices fail in either 
failure modes (i.e.. open, short, closed), or if the redundant system fails 
due to common-cause failures. 

3. Common-cause failures can only occur if two or more three-state 
devices are present in a system. 

4. A failed system is restored as good as new. 

5. Common-cause and other failures are statistically independent. 



m 

Notation 
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A, = constant open mode failure rale, for i'=0, 1,2,3, k 
a t = constant short mode failure rate, for j — 0, l s 2,3, fc 
■>-, = constant common-cause failure rate, for i-0,1,2,3, ...,(*—!) 
fi Sif = constant short failure mode repair rate 
fi c = constant open failure mode repair rate 
fi, c = constant common-cause failure mode repair rate 
state probability at time t for f-0, 1,2,3,,. n 
(Note: for i=n represents open failure mode probability at time / ) 
P{t j = common -cause failure mode probability at lime t 
P S n{l)— short failure mode probability at time I 
yv— total number oT devices in a system 
J = Laplace transform variable 
( — time 

The associated equations with Figure S.I 6 are 

W ' ) = - ( K +«□ + Y 0 )P a { ' ) + ^«('K* + Wm, + r n U)n a 

(8.107) 

P{(i)~ -(\ l +a l +y i }P l (i) + P Q (t)X a (8.108) 
Pl(t) = -(*2 + <*i + lMt)+ PiO)^ (8.109) 

P^ i U)--{K- l +^-^y l< - l )P t -,0) + P k - l (')k t _ 1 (8.110) 

: For * = 2, 3,4,, ..,(*-]), 

^(0=-(K+^)P k {') + P k -,(0K-, (sin) 

t for Jt-(n-l), 

p;{t)=-iL 0 P K {t)+P h U)K (8,112) 

t 

^(')=-f w ^»(')+ 2MK0 for * = «-!, (8.113) 

i-Q 

ft— I 

P c (/)-- Mf ^(0+ 2 lM) for (8.114) 

n=N for W>2 
\ i =(\-t)\ for f-0, 1,2,3,.,,, IV 

At ^0(0)= 1, other initial condition probabilities are equal to zero. 
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Figure 8,16 Transition diagram. 

The prime denotes differentiation with respect to lime f, l^iplace trans- 
forms of the stale probability equations are 



[ + P s „(s)ii sli +P e (^^ + P^)^ 



P,{s)- 



(8.115) 
(8.116) 
(8.U7) 



J + X, ,+ttA-i + Tft-i 
: for Ar=2,3,4 («- I). 

^-' A *-' for A = (n-1), 



X+ht+Ctf. 



(8.118) 



(8.119) 



for k"n-\ 



(8.120) 



(8,121) 




2t^) 

W m 1 k : = «-]. (S.I22) 

To use this model for series configuration interchange open failure mode 
probability with short (close) failure mode probability. 

*Jt$ /I 4-£foA Redundant System with Conunon-Cause Failures 

This model [13] can be used for devices with two mutually exclusive failure 
modes and common-cause failures. The transition diagram of the model is 
shown in figure 8.17. 

Assumptions 

1. Common-cause and other failures arc 5-indcpendcnt. 

2. Common-cause failures can onJy occur with more than one unit. 4-units 
are identical. 

% Units are repaired only when the system fails, A failed system is 
restored as good as new. 

4. System repair times are arbitrarily distributed. 

The transition diagram is shown in Figure 8.17. 
Notation 



i- state of the unfailed system: number of failed units / = 
0.1,2,3 
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j = slate of the failed system: j-4 means failure not due to a 
common cause; y = 4,« means failure due to a common- 
cause;^ =4, sc means short (closed) mode of failure 
/>(,)- probability that system is in unfailed state i at lime t 
piy it ) = probability density (with respect to repair time) that ihe 
failed system is in state j and has an elapsed repair time of 
y 

H (y)< <1j{y) = "H 53 " - rate ( a hazard ratc > and pr» b abihly density function 
of repair time when system is in stale j and has an elapsed 
repair lime of y 

ft = constant common-cau.se failure rate of the system when in 
state i\ /? 5 =0 

A, - constant failure rale of a unil, for other than common-cause 

failures, when ihe system is in state i; i—0> 1,2,3 
j= Laplace transform variable 

y, -constant short (closed mode) failure rale when the system 
is in state i;0, 1.2,3 

Equations (8. 123)- (8- 128) associated with Figure 8,17 are 
rf*W<> 



at ■ u 



™P*Jy>'WJy)4y 

0 

(8,123) 

^+(X ( +A+ T/ )l' i (*)-A ( _ 1 P l _ 1 (/)-0 (8124) 

for /= 1.2,3; /3 3 = 0 

3p^ + 8p i O^ + ^^ n _ 0 ^ 
it ey 

p 4 , cc {o,t) = Pjt)^P ] {t)fi t + P 1 {t)/i 1 (8.127) 
P^Ao.t) = P tt (t) l Q 0 + P t {t)y l + P 2 (t)y 1 +P i {t)^ (8.128) 
\, = (r- 1 )A 

/>(<?)- 1 for i=0 other P ( (o) = 0 
/•(■^OJ-O for allj 
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The Laplace transforms of the solution of (8. 123)- (8, 123) 



GJs)X 3 



^ 2 



/fjfj+Aj+ft+yj) 



^3= "T~ 



/»- ^ for 1=1,2,3 



P 4 {s) = \ 3 P 3 (s) 



(8,130) 
(8.B1) 



i'-0 



(8.132.) 



2 



i-D 



(8.133) 



To obtain lime domain solutions, (8. 1 29)- (8. 133) can be transformed for 
given repair times distribution. 



8.6 RELIABILITY OPTIMIZATION OF THREE-STATE DEVICE 
NETWORKS 

This section deals with optimizing the number of redundant dementi to 
obtain maximum reliability. Here, we focus on obtaining the optimum 
number of redundant elements for the series and parallel configuration 
only. 
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8.6-1 Series Network 
Using expression 8,1 the series system reliability of identical elements is 
given by 

R = (\-q<,y-<,? (8-134) 

To obtain optimum number of elements differentiate (8.134) with re- 
spec ( to n and equate it lo zero. The following results are obtained 

^^Slog^-tflog^O (5 135) 

EM 

where a 0 -(1 

Thus, rewriting (S.I 35) in terms of n optimum number of elements vi*, 
we get 



log. 



l°8.(*o/?,) 



(8,136) 



8.6.2 Parallel Net w>rk 

The following expression is directly obtained from (8,136) by reasoning the 
duality of the series to parallel form 

[loB«(lQE c g 0 /ioS.ft J )] i&.Ul) 
log^fflj^o) 

where a , = ( I - q s ) and m* is the optimum number of elements. Optimi- 
zation of series or parallel network reliability subject to constraints is 
presented in reference 24, 
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9.1 INTRODUCTION 

A primary requirement of a modern electric power system is a reasonable 
ability to satisfy the customer load requirements. In some electric utilities 
this involves generation, transmission, and distribution facilities. In others, 
the responsibility may extend over a part of the total facility. A complete 
power system is, however, composed of generation, transmission, and 
distribution facilities each one of which contributes its own inherent 
difficulties to the problem of satisfying customer requirement. A power 
system should be designed and expansion facilities planned so that it can 
perform its intended function with a reasonable risk. The risk of power 
interruption or capacity shortage can be reduced by providing more 
redundancy in the transmission and distribution networks and enough 
reserve generating capacity. There has, however, to be a trade off between 
reliability of power supply and the cost involved. Reliability models 
provide a means of carrying out this trade off. 

There has been considerable growth in the techniques for [he quantita- 
tive evaluation of the reliability of power systems. Because of the structural 
similarity of the various power systems, a number of generic reliability 
techniques have been developed lor planning, design, and operation of 
these systems. This ensemble of concepts, indices, and methods is generally 
referred to as the power system reliability. Numerous papers [12] and two 
books [9, 11] have been written on this subject. This chapter gives a 
compact and unified approach to power system reliability and references 
to more detailed discussions are provided. 

The major areas of a power system are generation, transmission, and 
distribution- For determining the reliability indices, the entire power sys- 
tem is not considered. Although conceptually possible, the complexity and 
dimensionality make this task rather impractical at present It appears, 
however, that this task will become possible in the future partly by 
developments in better techniques of modeling and partly due to an 
increase in both the speed and power of computers. At present, however, 
the major divisions in power system reliability are the generating capacity 
reliability, bulk power system reliability, interconnected system* reliability, 
and the reliability of distributions systems. 
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9.1 GENERATING CAPACITY RELIABILITY 

Generating capacity reliability evaluation can be considered in two basic 
forms, which may be designated static reserve and operating reserve 
requirements. The static reserve studies are concerned with determining 
the installed reserve capacity sufficient to provide for unplanned and 
planned outages of generating units and uncertainties in the forecast load. 
The operating reserve consists of spinning or quick starting units and is a 
capacity that must be available to meet load changes and also capable of 
satisfying the loss of some portion of generating capacity. Whereas the 
static reserve is of primary concern to the planning engineer, the operating 
reserve provides assistance in decisions on daily operation of the power 
system. Ideally both of these areas must be investigated at planning level 
but once a decision has been reached the operating reserve becomes an 
operating problem This section is concerned with the static reserve area 
and the operating reserve described in Section 3. 

Generating capacity reliability studies assume the transmission network 
to be perfecdy reliable and capable of transferring the energy from any 
generation point to the load point. This amounts to the assumption that all 
the generating units and loads are connected across a single bus. The 
assessment is basically concerned with the certainty with which the system 
load can be satisfied by the generation facilities. The three basic steps 
involved are [17]: 

1. A model describing the probabilistic behaviour of capacity outages is 
developed first. This is referred to as the "generation system model." 

2. The probabilistic nature of the daily load curve is incorporated into a 
"demand model" or "load model " 

3 The generation and load models are then merged or convolved to give a 
"generation reserve model" which depicts the expected occurrence of 
surplus capacity and capacity deficiencies. Several indices are defined 
on the generation reserve as measures of generating capacity reliability. 

9. 7. 1 Generation System Model 

Model of a Single Unit. A generating unit, especially a large thermal one, 
may have several different capacity levels. The consideration of partial or 
derated capacity states is not a major problem; however, m order to 
illustrate the basic approach each unit is assumed to exist either in an up 
(full capacity) or in a down (zero capacity) state. This binary model can be 
characterized by the following parameters: 

c- capacity of the unit in MW 
m — mean up time of the unit 
r= mean down time of the unit 
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Using these parameters, the model of the single unit can be expressed as 

follows: 



v\ litre 



Pr( capacity out =0) = 



m 



Pr(capacity out— c)™ 



m+r 

X + \> 
r 

m + r 

A 
A + H 



Frfcapacity out - 0) - Fr( capacity out = c) 



(9.1) 



(9.2| 



(9.1) 



X,ju = the reciprocals of m and r and are called the failure and 
repair rates of the unit 
Pr(), Fr{-)-Lhe steady-state probability and frequency of (-). respec- 
tively. 



Equations 9.1 -9.3 are independent of the form of the probability density 
function of the up and down limes. Equation 9 r 2 can be recognized as the 
unavailability of the unit and has been traditionally called "forced outage 
rate" and defined as [13] 

FOR forced outage hours 

in-service hours + forced outage hours 

System Model. The generation system consists of many units and they are 
assumed statistically independent. Such a system can have many possible 
capacity levels and the reliability measures of the following form are 
required. 

1. Prflost system capacity >x), the steady-state probability of lost capacity 
equal to or greater than x MW, The slate, >x is commonly called 
cumulative state as compared with the exact state, that is, equal to x 
MW. 

2. />(!osl system capacity > x), the mean frequency of encountering the 
cumulative state of x or more MW lost capacity. The reciprocal of this 
Function gives the mean cycle time, that is, the mean time between two 
successive encounters of this state. 
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for a ^ tlSSSSSX^ 
system states, calculating the P roba J. l ^ lUe * b> jj!-™' JL fre(me ncies can 
£d determine the subset probabilities This 
be readily calculated using the frequeney ^ nc "* ^ ™* "J- >™ 
I Zd is oracticable. however, only as long as the number of units 
onfrling h relatively small A generation system may have 

"vTalbuntd 'units and therefore this approaches not ^Tte 
Merlon svstem mode! is typically developed by the sequential addition 



of units. 



Alsoriikm for Unit Addition This algonthm [20] is basic to 

ThVas umpt.on of binary units is for keeping the dtscuss.on s.mp e. has 
Sen 3 hown in Terence 26 that the probability density functions 
o up a dl w! times do not effect the steady-state probabdilies and mean 
frequencies when the units are assumed statistically independent. The 
following notation is used: 

g-mi lost capacity state. Capacity slates are assumed to be 
' arranged in an increasing order of lost capacity, C + , ><-, 
Ft ,f t - steady-state probability and mean frequency of lost capacity 

capacity of the fcth unit 
h k ~\fm k 

m k = mean up lime of unit *c 

^ + *>] - steady-state probabilily and mean frequency of lost capacity 

^-number of capacity stales for the k umt system 
r, - mean down time of unit k 

I -IT "after," that is. after the unit addition or after the umt 
removal (superscript) 
The orocess of model building is started with a single unit and then each 

straightforward. Now assume that a system model exists for (A urut 
an til required to add the *th unit. Since the umt is «» ™f 
either in the up stale (lost capacity -0) or in the down state (lost capacity 
It , wo groups Of system Lies would be obtained after urut addil.on 
(jjAJ and T?U}. f-U * <~ Figure 9.!). Tne states in the 
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Figure 9,1 Slate frequency diagram for unit uddiliun. 



former group are termed "existing lost capacity states" and those in [he 
latter "generated losl_eapactty states." As an example consider C 5 such that 
(C i + c )l )>C s and (C 2 +e jS )<Q. The boundary for this stale is shown in 
Figure 9. 1 and the associated Frequency is 

=hp*+hO -Pk)HP2~Pi)p k K (9.5) 

la general, modified cumulative probability and frequency of existing lost 
capacity state i is given by 

T-O^ + rt'-^) (9.6) 

and 

fr=f t °p k +//( i -/>*)+( p;-pr) Pk \ k . (9.7) 

such that 



Cj>(€ r c k ) 
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Superscript o refers to the old values of probabilities and frequencies and 
superscript m refers lo the modified values after unit addition. The proba- 
bilities and frequencies of generated lost capacity states are similarly given 
by 

and 

such that 

The flow diagram for the computer implementation of this algorithm is 
provided in reference 20. The procedure described in reference 20 accom- 
plishes calculation and state reordering at the same time and is very fast. 

Algorithm for Unit Removal, Several limes it may be necessary lo remove 
a unit from the system model. For example, during the period of a year 
different units are on scheduled maintenance and therefore the same 
system model cannot be used for the whole period- The year can be 
divided into a number of intervals during which the units on scheduled 
maintenance stay the same and a single system model can be used- These 
system models can be derived from the master system model by removing 
the units on maintenance. 

Unit removal is the reverse of the process of unit addition described by 
(9l6)-(9,9), To reconstruct the system model prior to ihe addition of unit 
k, (9.8) and (9.9) can be modified. Since Cj is equal to or just greater than 
Q + c fc , 

and 

Substituting (9.10) and (9.11) into {9.8) and (9.9) and replacing sub- 
scripts (i+k) and i by t andy. respectively, 

and 



rr-p°p k +p;{y-Pk) 



(9.13) 



such that 
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Equations 9. 12 and 9.13 are the same as (9,6) and (9,7). Probabilities and 
frequencies of the old system model are therefore. 

r — p\ (9,4) 



and 



[fr-ffv-P k )Hpr-r;) P M 

'• = z ■ 

such that 



9M 



Cj>C-c k 

The algorithm is started with I and f"=0. After each unit removal, 
the system model may contain sets of states with different lost capacity 
values but the same probabilities and frequencies, Jn these sets, all the 
states except the last one should be deleted to obtain the exact system 
model prior to the addition of unit k. The fbw diagram for the computer 
implementation of this algorithm is given in reference 20. 



9.2.2 Load and Generation Reserve 

Load Model for Loss of Load Expectation. One of the indices used in static 
reserve studies is loss of load probability (LOLP) or loss of load expecta- 
tion (LOLE). The load model generally employed for LOLE calculation is 
of the shape shown in Figure 9.2. This cumulative load tunc indicates the 
time for which the load is more than a specified level in MW, This curve is 
either from hourly load durations or more generally from the daily peaks. 
In the latter case, it indicates the number of days on which (he peak 
exceeded a specified value. The LOLE is an expected value and is given by 
[6]. 

LOLE- 2 m (9.16) 

where p t — probability of a capacity outage equal to r, 

tf = number of time units, in the study period, that a capacity 
outage of c, would result in a loss of load 
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The values of P> can be obtained from the generation system model 
discussed in Section 92-1 by the Following relationship. 



PrP- p i+\ 

The LOLE is expressed in days/year, that is, the expected time in days 
that the load would not be met by the capacity in a period of year .The 
magnitude of load loss is not considered. The reciprocal of LOLE n 
vcars/dav is often used as a reliability measure; however, it does tend to 
obscure the fact that LOLE is a simple expectation. 

Load Model for Frequency and Duration. The load model for loss of load 
^ability method does not adequately reflect the shape of the daily oad 
variation curve. Reference 16 suggested a load model introducing an 
e posure factor to indicate that the peak load does not perstst for the 
enL day. The mean duration at a particular load level, usually about M% 
of the daily peak, is assumed to be exposure factor, This amounts to 
approbating the daily load variation curve as shown m Figure 9£ and 
taking the mean of e t as the exposure factor The loa d model then 
assumed to consist of a random sequence of N load states, each of which ^ 
followed by a low load state (see Figure 9.4). The state transition diagram 
for this load model is shown in Figure 9,5 and some parameters arc gtven 
below. 

Description of load levels. MW L, , f = L2, . . . ^ 

L,>L 2 >"- >L H 

Number of occurrences of L t *i 
Interval 1 ength D ™ 2 " ' 

Exposure factor, that is, the mean duration of e< 1 days 
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Fipmc 9 J The daily Load variation curve. 
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figure 9,* The basic load model. 
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Figure 9 J Suie transition, diagrun of the basic load mode]. 
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i ..„,_i-„ Rinchastic independence of genera- 

j- a pa^TTV RJiSHEtVT MODF1-. Assuming SlUWina"'- 

g^LT^ Z load model, they may be combined or convolved * 
^ capacity reserve model. Opacity reserve or margin ,s an excess of 
available capacity over demand, that is 

margin - capacity - load 

^ Shinties and frequencies or cumulative margins {margin < M) are 
nt "Id Se. may be computed by combining the cnmuU- 
tTSZT^ wUh the exact load state. Using the m^n state 
matrtx approach (IS 1 , it can be proved that 

<'■"> 



where A steady-state probability of margin < M ^ 

^probability of capacity < Q such that C„<(£-,+M) 
# + 1 =the low load state, L 0 

The margin <Af could be encountered either by the change in the 
J em cecity or the change in system load. The comnbutions to/ M ^he 
^ "n^of ^countering a margin by these two 
oflansihons are termed the generation system wmm and the load 
model transitions [181 

, '" ) 



where 



a- S 



i- i 



and 



(3,21) 



It should be noted that in (9.17)^(9.21), 0 is a funcfon oftotd ^* £, 
selected and is so determined that Thttt <(1^ 

that of capacity level corresponding to low load level such that C e0 < (L 0+ - 

Af ). 

TV C(/£f W MLEF Load Models. The single «P^J^J°£* 
mtdel dtsctld previously assumes a random sequence of dady peaks- 
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This is generally not true and there is likely to be a strong sequential 
correlation between the daily peaks. The choice of the exposure factor is 
arbitrary and its nature is questionable. For example if the capacity level c. 
exists for the day shown in Figure 9.3, then according to this load model a 
negative margin of m will exist for a duration e,. This obviously is not an 
accurate representation. 

Load models for accurate representation of the system load were devel- 
oped in reference 18 and described in the related publications [7, 8]. In 
these models, the expected daily load variation curve is approximated by 
the mean durations at various load levels determined as a percentage of 
the daily peak. The expected daily curve may or may not be symmetrica]. 
It is shown in references 7, 8, and IS that the asymmetry does not effect 
the steady state probability and frequency of encountering a margin. The 
Ml. IT (Multilevel exposure factor) representation amounts to approximat- 
ing the daily load variation curve as shown in Figure 9,6. For a bi modal 
curve this means transferring segments like d to d'. This does not alter the 
steady-state availability of the failure state, but the frequency is slightly 
decreased and the mean duration slightly increased. For example, for a 
given capacity level c jt the load exceeds the available capacity twice in a 
day with durations *?, and c/. respectively. Using this approximation, the 
load exceeds tj once with duration (e,+d). Such an approximation may 
even give more realistic indices. 

Whereas the MLEF representation assumes discrete variations in the 
exposure factor, the CVEF (continuously varying exposure factor) model 
assumes the exposure factor as a continuously varying function of the 
percentage of daily peak. The continuous approximation does not involve 
any additional difficulty and relatively little extra computational effort is 
required. 

capacity reserve model. Four types of load models are described in 
references 7, 8, and 18 depending upon the manner in which low load is 
taken into account and the sequential interdependence of the daily peaks. 
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The four possible combinations arc 

I Low load assumed same on all days and the sequence of daily peaks 
assumed random. 

2. Same as (I) above except that there is sequential interdependence 
between daily loads. 

3 . Low load different on different days and the sequence of dairy loads 

4. S a me°l (3) except that there is sequential interdependence between 
daily peaks. 

n MLEF and CVEF representations are possible for the four cornbina- 
described above. Expression for combination one, using CVEF 
» derived in this section. For other cases, the reader is 
1 in Terences 7 8. and 18. The derivations for the availability and 
^cyS<Vc. be understood with reference to the Figure 
9 7 whicb exposure factor as a continuously varying function £ jto 

daily peak. For a given capacity level CL, the corresponding load level L 
for a margin < M can be determined using the relationship 

C V -L B <M 

that is, 

L r >C c -M 

The expected duration of margin < M for the fth load cycle can be easily 
shown [IS], 

<-0 




Tim* (tuRi's! 

FU «* 9.7 Ensure f.rtof « » «mumWv v^s r unC uan of u* daily p«k 
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where rf c+f =the mean duration for which the load > L l +J . such that 

K+x = the probability of capacity < C 0+I 

In (9.22), v is such that Q — M makes the first intercept with ihe daily 
load cycle and n is such that C n+ „—M is at or just below the low load 
level. Assuming n s identical peak loads L,, the expected duration of margin 
< M in the period of D days 

r-t x-Q 



= *>^, (9.23) 
where W = tolal number of peak loads, that is, 2 fl, =£> 

i-l 

,4^= probability of margin <■ M 
From (9:23) 

£ ^K^-^.-i) (9.24) 



d - t *«0 



frequency of margin < Ai. The contribution to the frequency by the 
generation system transitions can be determined by finding the expected 
transitions out of the margin stales < M. In the period (Figure 9.7) 
the load is > £. c and therefore for the capacity level C E . the margin < M. 
The generation system may transit, during this period, to a higher capacity 
making the margin > M or to a lower capacity without change of cumula- 
tive margin stale. Therefore, 
The expected transitions during 

h^h^ from margin < M to margin > M =d c j l . (9.25) 

where / D = the frequency of encountering a capacity < C v or >C B . 

During periods hyk^ and /i 3 A 4 , the load is >£,„+, and the transitions 
to capacity level > Q. + ( can cause change of cumulative margin state. 
Therefore, 

The expected transitions during 

hj^hj and h 3 h 4 from margin <M to margin > M = ( , )/ tj+ , 

(9.26) 
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Generalizing from (9.25) and (9.26). the expected transitions due to all 
ihe capacity levels, in the ah load cycle are 

The expected generation system transitions out of margin <M during D 
days are 

From (9.28) 

The load exceeds a given capacity level once a day. Therefore 



From (9,29) and (9,30) 



^ j - I Jt-HJ ' 



11 should to noted that in (9.24) ^ <* 29 > « * diffeTCnt for MftKnt 
load cycles, 

extension to the uoss of energy conceit. The load models proposed 
Snte readUy extended to evaluate the probable curtailment of energy due 
to capacity shortages. Assuming the CVFF representat.cn the daiby k»d 
variation curve is shown in Figure 9.8, The mean ^.ons j^are 
the computer in discrete steps and the mean dm atio* between any two 
discrete steps can be computed by linear interpolation. The magmtude of 
E corresponding to the mean duration d } is called the subload level L tJ 
where i refers to peak load level. 

For a capacity level C c , the energy curtailed is equal to the area 
bounded by the ejected load variation curve and the mean duration d e 
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Duration [hotjri) 

Figure 9£ Expected daily load variation curve for the (th peak 



for which the toad exceeds C\,. The area of ihe shaded portion is 

= ^{d J _^d j _ 1 )I l 

where /, -the interval in MW between the successive subload levels, for 
the f th peak load 
= L, p/100 

where £.. = the ilh peak load, MW 

p-lhe interval between successive subload levels as a percentage 
of the daily peak 

The energy curtailed 

~\(d j +d v )x B +\(d J +d J _ i )J ) 4--+±{d 0 + d l )I, 

- i [ K+ rf,)< /, -y„ )+ {dj +d;_ ,)/,+ ■■+( d 0 + d : ) /, ] 



Generating Capacity Reliability 
yvhere 



fc-0 



the cumulative total of the subload mean durations up toyth sublevel 
such that L 0 > C„ 



where L,y 



■ the value of the yth sublevd of the i th peak, MW 



The expected curtailment of energy, given C of L, 



The 



total expected curtailment of energy in a period of D days 



EN= X 



hae the exact state availability of capacity level C 
n" = number of occurrences of peak L,, l~h2»**' 



such that 



D- 2*, 



(9.32) 



The expression (9.32) tft suitable for digital computation and its execu- 
tio^ s ve^TsL The value of EN gtven by this espresso may be 
mu tiplied b, Citable cost factor in S/MW-Hr to get the ******* 
S ars To act a. an index of reliability EN must be normahzed by 
divldmgTt by the total energy required by the system. Th* quntrty * 
given by the expression 



(9.33) 



where n-the sublevel just at the low load. Tbis quantity, which is rather 
TrnSh may be subtracted from unity to get what » convenucnally called 



the Energy Index of Reliability. Thus 

ElR = l$-EN/ENP 



(9.34) 
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sequential CORRELATION of daii y ]> EA1LS, A number of models have 
been proposed and analyzed in references 7 and IS. The essential dif- 
ference between ihe various load models is the assumptions regarding low 
load and the sequential correlation of daily peaks. The analysis leads to an 
interesting conclusion that, it the low load can be considered of constant 
magnitude, the numerical values of the probability and frequency of 
margin stales are the same whether the sequence of peak is assumed 
random or correlated. For most systems, the probability of having capacity 
as low as the Jow load period is very small and therefore the error in the 
numerical indices of reliability because of the assumption of constant low 
load is insignificant. The model discussed here is thus adequate in most 
situations of interest. If, however, low load cannot be assumed constant, 
the numerical indices are effected and appropriate load models [7, IS] can 
be employed. 

93 ASSESSMENT Q¥ OPERATING RESERVE 

The operating reserve evaluation is concerned with the ability of the 
generation system to meet the load within the next few hours. If a 
generating unit fails, additional capacity can be brought in after a time 
equal to the start up time of the reserve units. This time known as lead 
time, delay lime or start up time is different for different types of units. It 
is of the order of a few minutes for hydraulic, gas turbines; for the thermal 
units on cold standby il may be 4-24 hours. One way of reducing this lime 
is 10 keep the boilers banked; these units arc called hot reserve units. The 
reserve connected to bus ready to take load is called spinning reserve. This 
spinning reserve together with the rapid start and hot reserve units is called 
operating reserve. The basic problem is to decide how much reserve to 
have so that the load can be satisfied with reasonable level of risk. Three 
methods have been proposed for the assessment of the operating reserve 
and an excellent review of these is provided in reference 15. These methods 
are briefly discussed in this section, 

9,3.1 Basic PJM Method 

This method was first described in 1953 by a group associated wilh the 
Pennsylvania- New Jersey- Maryland interconnection [2], The index com- 
puted is the probability of having insufficient capacity in operation at & 
future time equal to the time needed to bring in additional generating 
capacity. It is assumed that there is enough installed capacity and it is just 
a matter of time before the additional capacity can be brought in to share 
the load. The present state of the system is assumed known and the start 
up time of all the stand- by units is considered the same. The procedure for 
computation is basically similar to die static reserve evaluation. The 
essential difference is the time. 
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■i-l ^nnceots will be illustrated using a two 
r< given that it is operating ai t-Q *s [231 

where A, ft are the failure and repair rates of ihe unit. 
lf ( fc +JI )r*l, then (935) can be approximated as 

^nression (9 36) can also be obtained by assuming that there is no 
^ in t^r and that T is small. If T is the start up time of additional 
S In Ins I probability of loosing capacity and not 
capacity, ure . . h outa s e replacement rate. Alter 

l ° rtpl r OrS S Se units Lh^uled aAme%0, the probability 
computing the ORRs foi the un ea i cu l at ed either by 

reserve evaluation. 



JU ,, , Rltk calculation. The load model for the operating 

££££ ™ £5? .oad ., r. r» « «* - 

of having insufficient capacity at T, is 

r- 2 ^< |oad ^ T ~ Li) pr ^v* city al T<Li) (9,37) 

v, i rhe value of forecast load. The continuous distribution of 
S^I IP;— by a discrete d,— « no uncertainty 
in forecast load is assumed, then there ,s only one value of L, and 

R = Pt t capacity at T < load a l T ) (9-3®) 

The computed R is then compared with a maximum 
to decide whether the system is -^"^^J^^^a" 
more or less than the reference value, suitable action can then he talcen. At 
^ the selection of R« b not simple and « generally based on 
judgment and past experience. 



9.3.2 Modified PJM Method 

The basic PJM method can be modified to t««P«f * £jg 
start and hot reserve units [ID]- The models for the rapid start and hot 
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F»|U loiiart Failed 




Ready lor iervic* In semice 

Figure 9,9 A 4-srjiLr model for rapid Hlart units. 



reserve units arc shown in Figures 9.9 and 9.10. The state transition 
diagrams arc self-explanatory- and the transition rates are given by 

where A ^ — transition rate from slate i to state j 

n i} = number of transitions from state / to state j during the time T i 
spent in state i 

Assuming the times in the various states to be exponentially distributed, 
the stale differential equations can be written as 
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The set of differential equations (9.40) can be solved using 
J^ns to emulate «^^^™™^ 

following steps: 

. ik,™ the scheduled units, that is. the units on line at f-0. the 
of having tnsufficient generation at r, is computed usmg the 
basic PJM approach, thai is, 

= risk during the interval (0. l f ) 

-probability of having insufficient generation at l> 

! mirina (0 / ) the rapid start units are assumed to be in ready Jor service 
TnJSi litv of unity Using the differential equations (9.40) and 
*f i^S^ ^^7-1 -a ihe probabilities of finding the raptd 
uuual condition P,a > ^ • ' etermined . Noting these 

5la Uties b " J?IU ° f ^ * raP ' d Smrl 

probab lilies by and r" w , P Qf ^ , id star , umt5 

r -del is combed wUh the 

load at l h la find the probability 

probability of having insufficient generation at f» 

i&o R( i + 1 is computed by considering all the units in operation at M) 
^ ul E ^"probabilities along with the probabilities of the rap.d 
tT t u its at . whtch is. in fact. zero time for these untt . 
* . - , + rv,- hnt reserve units are assumed to become available. The 

.he iMrd (0. proto biliti» of hot .«*rv. uM ». 

,ut e -U> a p ob.W l>o un .y m p eqiiations M 

r,n»:Xta,"^ 231- * 6 <nLoo„ ^n, to* 

„ H s ur, un,, 5 havm s oper^d „" Uo» Idel is to 

U. tt» - *L,di» 8 ,0 ~ ,0 is 
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given by 



A-Afr r ) + [*</»)-fla + )] + [fl(0-Jl(':)] (3-41) 



and 



p H [i)~i-PAt) 



J = 



/= 



'23 



P — 



and 



where 



JW 2 ,( 0+^(0+^(0 



ZJ7 



The risk giver by (9.41) is used as an index in the modified PJM method, 
It appears, however, difficult to assign a physical significance to this index. 
Also il appears that reference 10 does not incorporate the models for rapid 
start and hot reserve units in a realistic manner. Denoting ^,(0 ®* the 
probability of being in stale j starting m state r, reference 10 appears to 
compute the probability of a rapid start unit as 



(943) 



(9.43) 



where P u {t,, P d {0 are the probabilities of the rapid start units being up 
and down, respectively, at time /. The initial condition assumed is / j(0) = 
1.0, which does not reflect the fact that at f— 0, the unit was commanded 
to start. Reference I suggests an improvement in the procedure for 
incorporating the effect of rapid start and hot reserve unite- When a rapid 
start unit, for example is commanded to start, it either starts (stale 1} or 
fails lo do so (state 3). Denoting the probability of starting or failing to 
start by s and /, respectively, 



(9.44) 



(9,45) 



The probabilities of various states are now computed with the probabili- 
ties of starling in states 2 and 3 as s and /, respectively. After computing 
f*,j#)i the probability of the unit being down and up given the period of 
need are computed as 



(9,46) 



(9,47) 
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The probabilities calculated by (9.46) and *j%gf*J^ ^ 
imputation instead of PJO and P U U ) pvcn by (9.42) and (9.43). 

9.3,3 Security Function Method 

The security function method was proposed in reference 14 and later 
P ^i Si and expanded tn several publications (15]. Bastc.ll> this method 
Sates die probability of system trouble as a funcucn of time The toe 
^ To imputation fa, the lead time requued for die modtf^bon of the 
Stem operatmg configuration to achieve improved system secunty. The 
form of security Function suggested in reference 14 is 

i 

where /• (/)- probability of the system being in slate i at time r 

WW- Probability that the system configurahon of slate i results 
in system trouble 

Eouation 9 48 in its general form can be applied to the entire set of 
comZ nts comprising a bulk power system. When applied to the operat- 
problem sf 0 indicates the probability of insufficient capacity 
TtVm^nto future. The function 5(f) is examined for a bme period equa 
TZ ead tme, that is, the ttme to start and synchronize additional 
capacity If the security function is exceeding a predefined reference value, 
hen a decision to start additional capacity can be taken. Likc W ,se if the 
system appears too secure, appropriate generating capacity 
out for economic operation. This method treats the standby generators m a 
^nal manner and m conformity with the normal operating — £ 
the modified PJM method, the standby generators are started only when a 
^heduTed unit fads. The amount of standby generators are shut dow» 
Z !H scheduled unit has been repaired. 

PJM method, there is no difficulty in interpreting the sec urtty function 
SO) hsboud be noted that when only the generating system is cons.d- 
flUthen r is the shortest start up time ^^y ^ tors then 
the risk obtained by the secunty function method ,, the same as the basic 
PJM method. 

M4 Frequency and Fractional Duration Method 

Toe fluency and duration method for short-term .^^l^S 
originated m reference 18 and is also described , reference 23 . Tte 
previously described methods calculate the pomlwise F"*"bV g ^ 
% deficiency. Although the security function method «0 °™ 

the entire period, the total interval is not considered at a time, me 
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frequency and duration method in addition to the poiniwise probability of 
generation deficiency, also calculates two additional interval related in- 
dices, interval frequency, and fractional duration. 

Basic Concepts. The entire sample space X can be partitioned into dis- 
joint subsets X* and A " . Whenever the system enters any state contained 
in X* , this subset of stales is said to have been encountered. The following 
indices can now be defined. 

time specific probability OF X + . This is the probability of the system 
being in any state contained in X* at time t, 

Ki*> 2 *K0 (9-49) 

iex* 

where /*,(/) is the probability of being in state /' at time i. When X* is 
constituted by slates indicating system trouble, (9,49) becomes identical 
with (9.4S). 

fractional duration. The fractional duration of X + in die interval 
(fi,f 2 ) is defined as the expected proportion of (f|,f 2 ) spent in X* . 
Denoting fractional duration by £> + (/,, fj). 



(9,50) 



interval frequency. The interval frequency F + (l l ,r 1 )is defined as the 
expected number of encounters of X* in {r,, / 2 ). 

fJh-h)= 2 P>,<0 2 (9.5i) 

where A,; is the constant transition rate from state i to state j. 

Application to Operating Reserve. The relationships (9.49) -(9.51) are gen- 
eral and can be applied to the entire system or parts thereof. The applica- 
tion of these concepts to operating reserve evaluation involves the follow- 
ing two basic steps. 

generation system model. The generation system model depicts the 
time-specific probability, the fractional duration and the interval frequency 
as functions of the cumulative capacity outages, thai is t capacity outages 
equal to or greater than specific values. The numerical techniques for 
developing generation system model are described in references 18 and 22. 
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generation reserve model. The load is assumed to be forecast with 
probability one and to stay constant over the hourly mtervals. 
Se load is forecast with a certain probability distribution, there is no 
J^itSna difficulty in incorporating (his and also if a closer represents 
1 s requi ed. the intervals over which the load is assumed constant can 
h made as small as desired. Since the load is assumed to exist at a certain 
JSS ot terete levels and as the capacity states are also discrete, the 
derating or generation reserve which is capacity minus the load would 
X exist in terete levels. This can be illustrated by assunnng the load for 
four hours as 

Hour 0 1 1-2 2-3 3-4 
Load 20 40 50 60 

The hours will now be indicated by interval numbers, for example 0- I 
be denoted by interval #1. This forecast load ,s combined wUh t e 
veneration model, the resulting generation reserve will be as shown in 
Table 9 I The boundary of any cumulative margin, that is, a margin equa 
to or less than a specified value can now be drawn. The boundary of 
J£ deficient states, for example, ,s shown in Table 91. If the mterv^ 
fluency of encountering capacity deficiency is to be de lt ™ 
F+ (0,4) will be dcterm.ned where X + will contain all the states below the 

*t. general, denoting the capacity associated with the ith cumulative 
capacK outage state by C„ the boundary for cumulative reserve margu, 
7 that is! a 'margin equal to or less than M MW. corresponding to the 
load during the /th interval L } is fixed by the relaUonsh.p 

C-L,<M 

That is, 

C^Lj + M (9.52) 



Table 9.1 77ie Generation Reserve Model of the Example 



Interval # 




1 


2 


3 


4 


Load 


20 


40 


50 


60 


Cumulative Capacity Outage 










0 


55 


35 


25 


15 


25 


30 


10 


0 


-10 


50 


5 


-15 


25 


-35 


75 


-20 


-40 


50 


^60 
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The expressions for the different indices can be written using the 
following notation: 

r=the lead time 
/•,.(*) *= the probability of the /th cumulative capacity outage state 
at time t 

AOi.'i). 

r 2 ) = the fractional duration and the interval frequency of en- 
countering the fth cumulative capacity outage state in the 
interval (/p / j) 

/> H (T) = thc probability of the cumulative margin M at the end of 
lead lime 

F M (0, 7") = the fractional duration and the interval frequency Of en- 
countering the cumulative margin M in the interval (0, T\ 
that is, during the lead time. 

Probability. The expression for time specific probability of the cumulative 
margin M is straightforward: 

Pm(T) = P i (T) (9.53) 

such LhaL 

C i <.L(T) + M 
where L{T) «=the load at time T. 

The Fractional Duration. Denoting the time at the end of yth interval by 



where m Is the total number of intervals in the lead time T, and the 
cumulative capacity outage stale / during the interval j is determined using 
(9.52). 

The Interval Frequency. The state of margin equal to or less than M may 
be encountered either due to a decrease in capacity or increase in load- 
There are, therefore, two components of interval frequency, the generation 
system transition F£{0, T) and the load transitions, F^{0, T) such that 
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and 

W ch that £- < M+ L t at the end of the; th interval and 

C k <M + L; at the beginning of the ( j+ l)th interval. 

Finally, 

where ^= 1 if [^'/J- Wl is P°s' tive 
~0 otherwise 

Egressions 9.53-9-55 can be used to determine the three indices for 
S^margin M. If M is such that it defines the capacity dehcency 
slates the three indices are the three risk indices. 

9,4 INTERCONNECTED SYSTEMS 

The interconnection of a po*er system to one or more, JM*""* 
generallv improves the generating capacity rehab. hty. When the power 
Llem suffers a loss of load, assistance is generally available from the 

: i 2** ■* * benefits ? igz"*" 

result from the diversity in ihe occurrence of the peak loads and the 
outaees of capacity in different systems. 

n'ellaung t rehabiiity of interconnected systems, the load and 
generation in each system are assumed to be connected to a common bus 
and the tie lines are assumed to connect these buses together This is 
shown schematically in Figure 9,11. This means that wilhin each ^ 
the transmission system is assumed capable of ^^^"^ 
generation to the points of demand. Also when needed gen ration s made 
available to a system from a neighboring system, it is assumed that the 
intratransmission system can then properly distribute capacity 

The type of agreement wilh the assisting systems effects the evalua^on 
of hi rehabiiity of an interconnected system. This diseuss^n assumes a 
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Load 


LoaB 
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Figure 9,11 Schematic diagram of system A conneded to system B, 

basic agreement ihat one system helps the other as much as it can without 
curtailing its own load. "The concept may, however, be easily extended to 
cover other agreements. The discussion in this section is limited to a 
system A connected to system 0, which is generally called a two area 
problem. For a detailed discussion of this problem and that on multi-area 
problems, (he reader is referred to references 17, 18, and 3-5. The methods 
of reliability analysis discussed in this section were developed in references 
17 and 18 and later described in references 3-5. 



9.4.1 Independent Load Models 

The relationships for the probability and frequency of negative margins 
(load loss) in system A connected to system B are developed assuming the 
generation and load models in the two systems to be stochastically inde- 
pendent. Assuming the capacity and load in each system to exist at 
discrete levels, the margin stale, which is capacity available less the load on 
the system, would also exist at discrete levels in each system. In Figure 
9,12. M a , M b contain the margin states in systems A and B, respectively, 
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Figure 9.12 Effective margin sUtcs 
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without including the effect of interconnection assistance. These stales are 
arranged in the order of decreasing reserve, that is, 

and 

m bX >m t2 > • ' " > m bj> " " > m i>N3 

The effective margin states in system ,4. that is, when the tie line is in 
operation, are given by the elements of matrix M, 

where h is cither the help available to system A from system B or it is the 
help required by system B from system A. In the latter case h tJ has a 
negative sign, If no assistance is possible from one system to the Otto. 
n *, 0 The maximum of h u is limited by the be l.ne capability. The 
effective margin states in system A while the tie line is out are given by the 
elements of M\ 

Equations 9,56 and 9.57 define the boundaries in M and M' respec- 
tively of any effective cumulative margin. In this discussion, m with proper 
subscript represents an exact margin and M with the same subscript 
denotes the corresponding cumulative margin, for example. M i} means a 
margin equal to or less than m tJ . The probability and frequency equations 
for the negative cumulative margin are derived and for the equations for 
any margin, positive or negative, the reader is referred to reference 3 

Probability of Margin Equal or Less Than N. The probability of a margin 
equal to or less than tf is simply the sum of the probabilities of the margins 
states comprising this cumulative state. The equation for this probability .$ 
easily seen to be 

/■* 

where P.,,,, P^mpt probabilities of cumulative margins M aU) and 
M blky respectively 
,4^ = the availability of the lie line between A and B 

A J.™ I — A at> 

I fr=the indices defining the boundary of the margins less 
than or equal to N. The index / indicates the margin 
in array M B and k indicates the corresponding margin 
in M b to give an effective margin <N. For example 
the boundary in Figure 9.12 is identified by <AM,1), 
(AM - 1, 1) (6. 1>, (5.2), (4,3), and (3,4). 
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Equation (9.58) can be easily seen to be an application of conditional 
probability theorem [23 J. 

Frequency. Define 

/ A(i) ""thc frequencies of encountering the cumulative margin states 
M llin and M,„ ir respectively 
fi fli ,=the mean failure and repair nates of the tie line 

and 

/w^the frequency of encountering an effective margin in system A, 
equal or less than N 

System A can transit from one effective margin state to another in any 
of the following ways. 

1, Capacity or load transitions in System A, System A will shift vertically in 
M when the interconnection is in operation and in M' when the 
interconnection is on forced outage. 

2, Capacity or load transitions in System B. Due to transitions in system B, 
system A will transfer horizontally from one effective state to another in 
the matrix M, when (he interconnection is up. With the interconnection 
in the down state, the system A will transfer horizontally in the matrix 
M'. These latter transitions do not ultimately reflect into the effective 
operation of system A. 

3, Failure or repair of the interconnection. When the interconnection fails or 
is repaired, system A will transit from a state in M' to the corresponding 
state in M and vice versa. 

The frequency of encountering a cumulative margin in system A equals the 
expected transitions per unit time across the boundary defining that 
cumulative margin plus the transitions per unit time associated with the 
boundary states. The boundary states are a subset of the cumulative 
margin when the interconnection is down but leave this set as a result of 
repair of the interconnection. 

Summing the frequency due to the three modes of transition gives the 
following relationship. 

Ar= 2 ( [ Ut\ -/«(/+ 1>][ A .b P b W +A at ] 



+ [^,)-/ , a <, + , ) J{A (A ) + ['-^ tJ ]^ 0 *)^^) (9-59) 
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Equation 9,59 can be easily derived by the application of the conditional 
frequency formula [23], 

f N — Fr(N/TL l}p)A ab + Fr{N/TL Down)^, lft 

+ [P(N/TL Down)-P( Ar/TL Up)] ^A,, (9-60) 

where 

Fr{- ), P{; ) are the frequency and probability of {*), 
TL = l ie line 

Now 

Fr{ N/ TL up) - 2 { [ Um "/.(/+!)] *m + i p *fo ~ p « l+ ?J fbik) } 

M (9.61) 

Fr(N/TL down) - 2 [ /*,>-/„</♦»] < 9,63) 
i,k 

and 

pi N/ TL Down) - P{ N/ TL Up) = 2 ( 1 - p n « i ■ ) 

l.k 

It can be easily seen that substitution of <9,61)-<9,63) into (9.60) will 
yield (9.59). 

9.4. 2 Correlated Load Models 
In the discussion in Section 9.4.1., the load models in the two systems are 
assumed statistically independent. It is, however, more likely that the loads 
in the two interconnected systems will bear a correlation. This section 
develops the relationships for the probability and frequency of a cumula- 
tive margin, assuming the loads in the two systems to be perfectly corre- 
lated. 
Let 

(1 SA , L^) -perfectly correlated load levels in systems A and B where 
x-1,2 n 

MZ, W£=the reserve margin state arrays of systems A and B corre- 
sponding to the load condition ( L BJI , L bl ) 

The margin state arrays M* a and Jiff can be obtained by subtracting 
loads L and L bjt from the capacity states of the systems A and B, 
respectively. The arrays M* and M§ are shown in Figure 9T3 where the 
margin slates are arranged in the decreasing order of magnitude. 
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LOAD CONDITION ll. fl0 , L b ) 
Figure 9.13 Effective margin states HlOtriee* for system A connected (o system B. 



As before c and m * wilh proper subscripts represent exact capacity and 
conditional exact margin stale and C and M* with the same subscript 
denote the corresponding cumulative states. For example, C Bi means a 
capacity equal to or less than c ai , and similarly Mf, means a margin equal 
to or less than mfj. The effective margin states in system A given the load 
condition ( L ax . L hx ) and ihe interconnection in the up slate are given by 
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the elements of matrix M' of Figure 9.13, 

u,here hf is either the help available to system A from system B or it is the 
>J ^ tiuj^i i , j pj vcn ti, c load condition 

help required by system B from system A, given uk 

<L Th'e L effe C tive margin states in system A given the load condition 
t L L and with the interconnection down are given by the elements of 
matrix M 1 

Equations 9-64 and 9.65 define the boundaries or effective cumulative 
margin states in M' and M'\ respectively. The effective margtn states tn 
Z jm A given the low load condilton ( L„. L b J and the mierconnection 
n tle up state are given by the matrix M" (see F.gure 9.13), and I ite 
elective margin states for the above condition with the mterconnechon m 
Ihe down state are given by M 

Probability of Margin Equal or Less Than N. Using (9.58) 
ix , kit 

where P {fl/xi - probability of an effective margin < H, gtven the load 
condition (L at , L bx ) 
ix Av = the indices defining the boundary of N in M , tor 
' example, if the boundary is "a be f hie", the indices 
arc (*/U>-(M). (5,2). (4,3). and (3,4) (see F.gure 
9 13) 

r ail) , ^-probabilities of AC and Mi k , given the load condition 

Since 
and 

it can be seen that P m , P m are probabilities of 
Using conditional probability, for R load levels and low load level 

(*=0), 

where .4^- probability of load condition (L^, L Ajt ). 
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Frequency. The two different modes of slate transition in system A for a 
given load condition (L dK , L bx ) are 

1 . The generation system transitions in system A or system B. 

2. The state transition due to the failure or the repair of the tie line. 

The frequency of encountering any cumulative effective margin in system 
A due to these two modes can be determined using (9.59), 

f{N/x)— 2 {(/«(!) ~Ja(l+ t) J [ "^fl* ' A.) + ^ ni ] 

where j (N/ , x> =ihe frequency of encountering an effective margin KN y 
given the load condition { L axT L bx ) 
f a 0)> tne frequencies of encountering M* t and M bk , respec- 

tively, given the load condition (L azt L bf ) 
= the frequencies of encountering C at and C bk , respectively 

The frequency of encountering the effective margin <N with the load 
condition ( L axt L bx ) is 

The frequency due to the two modes listed previously is given by the 
summation over all the load conditions, that is, 

n 

fcN = 2 fjV 
J™ I 

This/ CjV represents the contribntion to/ w {the frequency of encountering 
the margin <N) due to the generation transitions and also takes into 
account transitions due to failure or repair of the interconnection. It does 
not, however, take into account the contribution due to the load transi- 
tions. 

THE CONTRJBLITlorv DUE TO THE LOAD TRANSITIONS. The peak loads are 
assumed to be followed by the low load period. Thus the system can 
transit from a load condition L^) to (L d(> , L ta ) and again to some 
load condition {L aJi L bj ). ft should be kept in mind that there are no 
interpeak transitions, that is, the system cannot transit directly from 
(L ai , L bi ) to {L aJ , L b j), The contribution to j N will thus result from the 
transition of system A from a given load condition to the low load 
condition and vice versa. 
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Let the boundary of ft be represented by ft**** 
j^nd respectively. The ^^^^j^ states as 
may be represented by W^^^U*^ in the boundaries 
* f Llt of the load I— are ho- ^ ^ Ul the$c 

. ab cfhlt- and jtki b ^ n fV h e contribution due to me trans.lion 
states be represented by a set a ^ ^ are 

from load condition (L fl *. ^4*) «> mc 
gjven by 

where probacy 0, . b—, -ft *- * 



It can be seen that 



Therefore, 



2 'V/*)" =/ W*) _ 'VM 



be found by summation over ah the load 
The total contribution can be iouihj ugf 

conditions, and is 

n 

Adding this to/ c * 

J£-l 

The step by step'proeedure for evaluating the reliability indices in system 
? w^^rrelSed'oad models b outlined as follows. 



/ 1 is selected first and from each 
L The low load condU.on obtain M*. tit is obtained in a 

be fixed using (9.64) and (9.65). 
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2. A {N/<t) and_/j A ., DJ are evaluated using equations (9.67) and (9.68). Here N 
represents the effective margin < the first negative margin in M° or Af 

[f the low loads in the two systems are assumed zero, 

3. A in j iy and f iN / z) for ever}' load condition are evaluated in the same 
manner as outlined for the low load siatc. 

4. The probability and the frequency of the failure slate in system A can 
be finally found using the (9,67) and (9.69). Assuming the low load level 
in both the systems to be zero, these equations simplify to 

* 

and 

/-= 2 -■ 4 ,[/<jv/ j o+ / Wx)A] 

where e — the load exposure factor. 

9.4.3 System Studies 

The techniques described in Sections 9.4.1. and 9.4,2. have been imple- 
mented in a computer program [21] and several studies based on this 
program have been reported in references 17. 18, and 3-5. A typical study, 
the effect of tie line capacity on risk level in system A, is reported here. 

A system designated A is assumed to be connected to an identical 
system B by a single lie line. The mean failure and repair rates of the tie 
line are assumed to be 0.01 and 2.5 per day, respectively. The description 
of the generation system and load model in each system is provided in 
Table 9.2. 

Tabic 9.2 Generation System 



Mean Down Mean Up 
No. or Identical Unit Size Time Time 

Units (MW) (Years) (Years) 



1 250 0.06 2.94 

3 150 

2 100 

4 7-1 
9 50 

3 25 



Total number of units = 22 

Total installed capacity = 1725 MW 
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Load System 

Exposure factor— 0.5 day 
Period = 20 days 



Load Condition 



(*) 



(MW. MW) 



No. of 
Occurrences 



1 

2 
3 
■1 



(1450, 1450) 
(1255,1255) 
(1155,1155) 
(1080, 1080} 



a 

4 
4 
4 



The low load level in both the systems was assumed to be at zero MW. 

The study was carried out by varying the tie capability from 25 to 625 
MW. The mean failure and repair rates of the tie were maintained at 0 01 
and 2 5 per day. The results of this study are shown m Figures 9^14 and 
9 15 The curves representing the correlated load models are labeled and 
those correspond^ to the independent load models by 2. Wifk the lower 
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Peak In »ys. A - 1450 MW 
Peak in tys. B - H50 MW 
MFR.ol tie line- - 0.01 fadure.'day 
MRU. ol tie line - 2.50 repairs/day 



1, Correlated 
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npHT M Vanatioc ot ri* kvet (u^biuiy, in system A with *. variation cf tie to 
capability. 
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_ 



Peak in System A ■ 1450 MW 
Peak in System 6 = USD MW 
MFR. of lie Hrt* - 0.01 failure/day 
MR R. of lit line- - 2.50 repairs/day 



1. Correlated 

2. IndefHjnrlcnt 



-L 



4S0 



5§D 



150 250 350 
Tie Una capability |MW| 

Figure MS Variation of risk level (cyefe time) in system A rtiih [he variation or tie lime 



values of tic line capability, the system is closer to being isolated rather 
than interconnected. The effect of the interconnection is, therefore, not 
significant and the difference between the two sets of results is not 
discernible. As the tic capacity is increased, the interconnection becomes 
more effective, and, around 125 MW, the two results begin to deviate 
significantly. It can be seen from Figures 9,14 and 9.15 that beyond 250 
MW there is no marked improvement in reliability indices for curve 1- This 
is then the practical limit Tor lie capability with the correlated load models 
and in this case it is reasonably close to the independent load models 
condition. The limiting values of unavailability and cycle time for the two 
cases arc, however, significantly different. The independence assumption 
gives optimistic results as compared with the correlated load models. 



9.5 TRANSMISSION AND DISTRIBUTION SYSTEMS 



A number of techniques have been proposed for the quantitative evalua- 
tion of transmission and distribution system reliability. It is now generally 
accepted that within the bounds of distributional assumptions, the Markov 
approach is the most accurate. If the fluctuating environment is not 
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Eluded in the analysis, the transmission system elements can be consid- 
ered independent and the probability and frequences calculated dtre*ly 
and simp£ In the case of independent components, cut set or 1* set 
Methods can also be effectively utilized. When, however the 
en' tronment. stormy and normal weather, is considered, the statical 
ShX of the components cannot be regarded mdependent and he 
solution of 2" +l lmea* algebraic equations is required, where n « the 
number of components. 

£hen the number of components is large, the number of linear equa- 
tions becomes unmanageable. Method, like state merging, -*°*«"" 
tkmeation and sequential truncation have been proposed for aUev.at.ng 
^s problem and Z described in reference 19. The most efrictent method 
for dealing with transmission and distribution systems, mvolvmg depen- 
dent modi .ike the fluctuating environment * the Mar ov Cut Sen method 
1271 This method is a combination oF the cut set and Markov methods. 
This composite approach consists in decomposing the system by cut sets 
and then using Markov processes and frequency ^'^^^^ 
the calculation oF the terms in the cut set expansion. The Markov process 
of only the cut set members is considered and, therefore a limited number 
o equations need to be solved at a time. A very useful feature of tins 
approach is that both time-specific and steady-state probables and 
^uencies of system Failure can be calculated. It u also P°-U*to 
K and measure the degree oF accuracy oF the 
method is illustrated For transmission systems exposed to a . 2-st ate fluctuat 
ing environment. The method can, however, be used to deal with depen- 
dence due to maintenance outages and common mode Failures. 

9l5.I Mownal Cut Set Method 

The equations for the steady-state probability and average frequetH* of 
system failure are 



7 kj i<J<k 

( _ 1 p r ( c, n c 3 n ■ ■ n CJ 

and 



(9.70) 



+ 2 M^CnQW-l-ir'^nQn-CJ 



l<j<k 
f*|+l + ■--+< 
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when P f , /,= probability and frequency that the system has failed 

C-cut set » and also the event: all components of C, are failed 
m = number of cut sets 
Pr(C t nC / ) = probability of the components of both C and C failed 
Hj = repair rate of component j 

The min cut sets can be calculated using Failure Mode and Effect 
Analysis and for some well-defined reliability block diagrams, specific 
algorithms are also available. The min cut sets are defined as sets of 
minimum number of components whose outage will result in loss of 
continuity for the system. Once the min cut sets have been determined the 
probability and frequency of system failure can be determined using (9.70) 
and (9.71). The mean duration of failure state can be determined using. 

<*rj f - (9-72) 

For practical applications, *,//!_,■« I and the upper bounds [24] to 
probability and frequency of failure give results very close to the exact 
values. These upper bounds are 

fy-SW,) (9.73) 

i 

and 

//.-2MQ)?i (9.74) 
t 

where P /u , J fu arc the first upper bounds to probability and frequency of 
system failure. 

The interval in which P f and f f lie can be determined by calculating the 
lower bounds as well, 

fy- 2 q ) - 2 Qn C J ) (9.75) 

and 

ffi" 2 PriCi )Mi - 2 Pr{ C,n q (9.76) 

Increasingly closer upper and lower bounds to P f and j f can be obtained 
by the successive addition of odd and even order terms [24]. It should. 
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P^S. ~ WroSb >o .he ^ and «rM> 

9,5 2 The Markov Cut Set Method 
be regarded statistically independent. 

system state spac. In * ^".gfeg ^ of the states 

members if / and m hi *M»*-J m ^ |ure of £ numbers of C, » 
of the other components ot the system. ■ uc >•* h 
equivalent to *eVem being in subset SJ of the state space 5, where 

in the state Sj , the components / and m are failed 
and the other components exist in either state) 

The state s" in which members of Q are faued and all the otto 

by the downward transitions from sf. 
Il can be seen from the discussion that 



(9.77) 
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where Pr{S t )= probability of the system being in S i 

= S ?r( Sj ) (9.78) 

where fr(i y )= probability of being in system state J. 

The Frequency formula, not dependent on component independence, can 
also now be stated in terms of $., 

i<j<k 

(-l)"-'/-(5 l ns 2 n- -n^) (9.79> 

where F(S f }= frequency of encountering subset S,. 

Equation 9.79 is true whether or not the components are statistically 
independent. 

The Method. The problem of calculating the probabilities and frequencies 
of cut sets and their intersection can be transformed into that of determin- 
ing these values for the corresponding equivalent subsets using (9,77)- 
(9.79). It is now proposed that the Markov and frequency balancing 
approach be used for the calculation of probabilities and frequencies of 
equivalent subsets. 

Consider, for example, subset S t equivalent of cut set C f . If Pr(S t ) and 
F(S S ) could be calculated from transition rate matrix" of only those 
components that are members of C ; , ibis would present a big step forward. 
This is because in a large network, the number of elements in a minimal 
cut set is generally much smaller than the number of components in the 
whole network. The number of components to be dealt with at a time can 
also be kept within reasonable limits by excluding minimal cut sets beyond 
a specified order. As an example assume that the number of components 
in a system is 50. The total number of states, if the equations for the entire 
system are to be solved is 2 S \ that is, 225 X JO 13 when the 50 components 
are exposed to the same 2-state fluctuating environment Now if the largest 
cut set to be considered is of the order 5, then only 2 S+ 1 - 64 states need to 
be dealt at a time using the purposed method. The solution of 64 linear 
equations on a modern digital computer is a trivial task. 

It can be shown that in a system exposed to a 2-state fluctuating 
environment ^(S,) and F(S t ) can indeed be calculated by developing the 
transition rate matrix for the elements of C, only and neglecting other 
components of the system. 
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Let « be the number of components comprising C t and J*-**-** 

Jremauung r^^JS^^^^ 
C Here n is the total number of components, ne sets « 

hut the interstate transition rates in the two wcatner s«i« * 

vi T s Le space corresponding to Jf. the transition rate from con^ura- 

JTh * "e "indicated by hi ,T 2 ^mi 

he sta'es generated by X b> that is, components not membe of C, are 

indicated by t>\ b q in the normal weather condition and b v b 2 b q 

iT1 lh e adverse weather condition, q being equal to 2 >. 

The state space of the endre system can now be generated from the state 
Jce of X and X b . When X B and X b are combined, there wHl be pX<, 
ties in each weather condition. This combination of state , for the 
normal and adverse weather is shown below. 

Normal Weather State Space 

b t a t b l a 1 ... b,a p 
Mi b 2 i2 ■■■ M f 

b !) a i b t a r 
Adverse Weather State Space 

b\al b\a\,..., b\a' p 
b\al bia> 2 b\a' f 

: : ■ 

b' q a\, b,a 2 K a "f 

The transition rate from ^ to ^ that is, the transition rate 
rrom a, to a k . The transHion rate from to ¥ t a) ,s % - I / 7 .where T is 
2 mean duration of the normal weather. The transtbon rate from 
l ^thkewise ^»l/r. where T f is the mean duration of the adverse 

"Tefihe states be grouped into subsets D s and D- in the normal and 
adverse weather conditions, J- Ift...,* These subsets are such that 

^-{Mi-V V'l 



2SS 
and 
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The necessary and sufficient condition [23] for the states to be mergeable 
into subsets is that for any two subsets /) . D r the transition rate from each 
state in subset D i lo each of the states in D } when summed over all states in 
Dj is the same for each state in D, and this equivalent transition rate from 
D t to Dj is given by 

JIED, 

When this condition of mergeability is satisfied between all the subsets 
taken in pairs, the Markov process of the system is said to be mergeable 
into these disjoint subsets. Let us apply this condition of mergeability to D. 
and Df, j= 1 p. 

1. For any two /)., J^iX/a^X,, is the same for each /eZ), and is equal to 
flf j7 which is the normal weather transition rate from a t lo a r Therefore 
the condition of mergeability is satisfied for any D lt D^ and the transi- 
tion rate from subset D, to Dj is a^. that is, the transition rate from a, to 

2. In a similar manner the condition of mergeability is satisfied between 
any two Df and Df and the transition rate from Df lo DJ is jS ry . that is. 
the adverse weather transition rate from a 1 lo a 3 . 

3. For any pair D t and Df, the condition of mergeability is also satisfied 
and the transition rale from D. to Df is w. and from D 1 to D, is w . 

From the preceding discussions, it can be concluded that the Markov 
process for the system is mergeable into subsets £>., Df. It can also be 
recognized that the merged Markov process is identical to the Markov 
process for components of X a , that is, components member of cut set C f . 
Now if the states a p and a* p represent the failure of all the elements of C, in 
the normal and adverse weather respectively, then 

D p = { subset of states in the normal weather having members of C, 
failed and other components of the system in either failed or 
good state} 

£>/= {subset of states in the adverse weather having members of Q 
failed and other components of the system in either of states} 



That is. 



s i =D p uo; =d p + d; 
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The merged process is identical to the process corresponding to the 
oiembers of cut set C r Therefore 



= Pr{a p ) + Pr{a' f ) 



(9.W) 



(9.81) 



Here fid U«') « the frequency of encountering the state where alt the 
elements o?C r aVe failed and can be readily calculated using frequency 
balancing concept. For steady-state 



(9.82) 



From (9.80) and (9.81) it can be seen that the probability and frequency 
of a cut set t,. for the system exposed to fluctuating environment, can be 
goXd by consider the Markov process <*^^%£^ 
of cut set C, only and that the trans.fon rate matnx the etil. re ystem 
need not be generated. Therefore, the terms m (9.70) and (9.7 ) can be 
computed by generating the transition rate matrix of the elements of each 

u7Tet or mtcL-Con at a toe and a, noted previously these matnc* re 
much smaller in size than the matrix for the entire system. It is to be noted 
la TZl 1 necessary and sufficient condition of mergeab ; ty ,s sat,, 
fi P H tdm\ and (9 81) can be used for both lime specific and steady state. 

R beTrn" howe^r obvious that the higher the order of intersections 
cohered, the less advantageous the procedure become: .since £ numbe 
of components to be considered at a toe increases. Therefore, for 
successful implementation, the following procedure is suggested. 

PROCEDURE 

1 Identify the cut sets to be considered. The cut sets having more than x 
cornet may be ignored. It will be reasonable to ^ce 
the probability of more than 5 overlapping outages can safely be 
regarded negligible. . 

2 Since in all practical systems the component failure rate is much smaller 
IZl lU rate, the upper bound will give an almost exact resu.L 
Therefore P f and f f can be approximated as 
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(9.84) 



3. The terms of (9. 83) and (9.84) can be calculated by generating the 
transition rate matrix of the components of each cut set at a lime and 
computing Pt{C,) and F{S f ) using (9.80) and (9.S1), 

It can be seen that if llie above procedure is followed only 2* +l 
equations need be solved al a time, x being the number of elements in cut 
set. If x — 5, it means 64 equations, which is a trivial task when digital 
computers are employed. The calculation of the first lower bound will not 
involve much additional difficulty and can provide insight on the margin 
of error. 

It should be carefully noted that (9.80) and (9.81) for the calculation of 
the terms of (9.70) and (9.71) or (9.83) and (9,84) are exact. The approxi- 
mation involved is either in the ignoring of higher order cut sets or using 
upper bound approximations by (983) and (9.84) instead of complete 
(9.70) and (9.71). 

Comparison with Slate Space Truncation, [f the cut sets of say order 6 or 
higher were to be ignored, one might ask, "How is the Markov cut set 
approach superior to slate space truncation when contingencies of order 
higher than 5 are ignored?" Consider a system of say ft components 
exposed to normal and adverse weather. If contingencies of the order 
higher than x are ignored, the number of linear equations involved is. 



Tf, for example, and x-5 t the lolal number of slates or corre- 

sponding equations is 2369936, which is beyond ihe capability of the today 
computers. On the other hand using Markov cut set approach only 64 
equations need be considered simultaneously, which for state space trunca- 
tion, corresponds to considering only single-order contingencies. 

approximations and EXTENSIONS, The Markov cut set method has been 
shown to deal in an enact manner with the form of dependency induced by 
the fluctuating environment. Even though this method may not exactly 
apply to all forms of dependency, it could provide good approximations 
for certain limited forms of component dependence due to maintenance 
outages and common-mode failures. 

Example. The Markov Cut-set method is Illustrated by application to a 
complex configuration shown in Figure 9.16. The cut sets of ihis system 
are identified in Table 9.3. The relevant data is shown in Table 9 4. which 
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FTjpire An exmnptc of * compter cMJiBumlwn. 

n nt the results bv exact Markov and Markov cut 
^SZ^:^^y Markov cut set method are only 

LuSbh higher than the Markov method. The mean duration of the 
?5 rf at^c n be obtained by the (9.72). The error M : somewhat h.gher 
T Wr-ercentattt of failures during adverse weather. This .s to be 
£££ XltXulL in effectively leasing the ratio tf f— e 

oCah * lt should be reiterated here that any error introduced by the 
LXv cu set method is because of the use of upper bound app™xrrna- 
"fand Z 1 calculation of terms of the equations for the probability 
and frequency of failure. 



Table 93 Minimal Cut Seta of Figure 9.16 



Cut Set 


Component Members 


c, 


1,2 




3,4 




1.4,5 


c< 


2,3,5 



Table 9.4 Comparison of Markov and Markov Cut ^\ M ^f. , 
Components are assumed identical. Average fadure f ^/*" „ 

Normal weather mean duration = 200 hours. Adverse weather mean duration 

= 1.5 hours. 



Mean Down 


Failures 


Time of Each 


During 


Component 


Adverse 


(hours) 


Weather, % 


S 


20 


s 


SO 


10 


20 


10 


80 



Failure 
Probability 



Failure Frequency 
(per year) 



Markov 



Markov 
Cut Sei 



Markov 



Markov 
Cm Sei 



3 4996 x 10 " T 
3.3144x10"* 
l,0736xl0" s 
7.7573x10"* 



3.4996 x 10" 7 
33157x10"* 
1. 0736 x 10 ~ t 
7.7633x10"* 



1 1275 X 10 
1.1674x10 
1 .8830 X 10 
1,3678x10 



- J 1.2275x10" 

-2 1,1684X10 

3 1.8831x10 

- J 1.3695X10 
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ON QUALITY OF service. The method discussed can he used for both 
continuity of service criteria as well as indices considering the voltage 
levels. The difference lies in the calculation of minima] cut sets. The 
process proceeds in essentially two steps. First a set of components is 
assumed out oF .service due to forced outage or maintenance outage. Given 
this event, the level that will cause unacceptable voltage level is then 
determined using load flow. Repetition of this procedure for different sets 
of component outages, then identifies the minimal cut sets in terms of 
component outages and load levels. Once the minimal cut sets have been 
identified, the Markov cut set method can be used. 
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10. 1 INTRODUCTION 

Reliability is an imporlant consideration in the planning, design, and 
operation of transit systems. The discussion in this chapter is focused on 
track bound transit systems; the principles can, however, be applied to 
other types of transit systems as well. The term track bound Is used here to 
describe systems whose vehicles are captive on a common track This 
includes steel wheel on steel rail, rubber wheel on concrete guideway and 
magnetically levitated vehicles. In the case of a road system, the failure of 
a vehicle affects the concerned vehicle and some delay may be caused to 
the other vehicles. The effect on the system is, however, more or less 
localized since the failed vehicle can be putted to the side or bypassed by 
the other vehicles. The bypass capability or the tract bound systems on 
the other hand is extremely limited. The failure of a single vehicle in such 
systems could affect or immobilize the upstream vehicles and depending 
upon network configuration, the degrading or immobilizing effect could 
spread over the entire or a major part of the system. This serial effect 
makes the reliability an all the more important consideration in track 
bound transit systems. 

Reliability is important for both the transit operator and the passengers 
Lower reliability means increased unscheduled maintenance and decreased 
equipment availability. If availability is low, more vehicles are needed to 
meet the passenger demand but even with more vehicles, system perfor- 
mance may not be satisfactory. More vehicles can increase system availa- 
bihty but do not decrease the incidence or system failures. Reliability is 
important to passengers as it reflects the ability of a transit system to keep 
operating schedules. 

Traditionally, the transit operators have been relying on warranties to 
assure the procurement of reliable equipment. The warranties are, how- 
ever, more like maintenance or service contracts and do not necessarily 
serve as deterrents to system unreliability. Warranty makes the manufac- 
turer pay for repairs during a limited period of time but once the warranty 
period is over, unreliability becomes the headache of the transit operator, 
[he operators are now realizing that economical reliability can be built 
into the systems only during the design, development, and manufacturing 

m 



16,1 



_ e of the equipment. The s^f?* 5 ^ 
frying reliability large* and mnplementing re. > ^ ^ 

^during design and ^^^^ ^ model* for the 
gf lor the analysis of des,gn of combining these 

mponents of a transit system and the tecmuqn 
Soils are described in d»s chapter configuration 
The models are discussed with regaro i slat ions. From 

h Ihich the track is a single to* . Figu « f^*^ ^ * generally 
t view point of reliability ^^^t^ the vehicles move 
Warded equivalent to a stan0 n they are switched 

Jcne track in one «*^^ d "S direction. The methodology U- 

Z^tX^XZ^ 10 devdop modds more 

1. Vehicle fleet. 

2, Passenger stations. 
3 Substations for power supply. 

4. Command and control. 

5, Guideway. 

These subsystems, in turn, may be divided into 
JSL chafer. Models for each subsystem are des.nbed. 

10 2 BASIC THEORY 

The models consider ^n^^^ 

spare vehicle and vehicles on ^ * ^ There is a considerable 
bined with the models for the otner y 

statistical dependence between h vanou s ^ 

retrieval, and maintenance. The state space a P p 

used in developing these models. 
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In this approach, the possible states of the system (state space) and the 
modes of transiting from one state to another are identified. Each mode of 
interstate transition is assigned a specific value called its interstate transi- 
tion rate. The stale equations can be written using the frequency balancing 
approach [10]. The frequency of transiting from state i lo state j is defined 
as the expected transition rate from state / to slate j and is given by P j X i 
The frequency balancing approach states that the rate of change of the 
probability of being in state / equals the frequency of transiting into state / 
from all the remaining states minus the frequency of transiting out of state 
i, that is, 

2^<<)A„-^<f)2*«-£<*> (10.1) 

J J 
In the equilibrium condition, ^(/)-0, and therefore (10.1) reduces lo 

2'A*-^S>,,=° (10.2) 

J J 

that is, the frequency of encountering state i equals the frequency of 
encountering the rest of state space from stale i. For n states there are n 
equations and they are linearly dependent; any equation can be obtained 
from the remaining (n- 1) equations. Any (n- I) equations together with 
the total probability equation, 

can be solved to obtain the state probabilities. These state probabilities can 
be used lo obtain the reliability measures. 

The number of slates tend to be large due to the size and complexity of 
the transit system. The models can be reduced using the concept of 
equivalent transition rate [10], which under the equilibrium condition is 
given by 



X*-x+ " lhe equivalent transition rale from subset X ' to subset X + 

- 2 2 ■ (10.3) 
i£X j^x* 2 p ( 

The other techniques used for keeping the number of slates within 
manageable limits are state space truncation and sequential truncation 
[10]. These techniques systematically omit or delete states with relatively 
low probabilities. 
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10.2.2 Measures of Reliability 
Transit systems, like other commercial or public systems, are designed to 
meet a certa.n demand. Therefore, the reliability of transit systems can be 
viewed in two ways. The system is comprised of hardware, software, and 
the human interface, although in most of the reliability studies only 
hardware and computer-based software (if any) are considered. One way 
of looking at the reliability is in terms of the system deficiencies. The 
measures relating to this approach are termed system-based reliability 
measures It is also desirable lo know how these system deficiencies relale 
lo the inability or the system to satisfy the demand and the corresponding 
measures are called the demand-based reliability measures. Obviously both 
of these types of measures are interrelated. 

System-Based Measures. Reliability indices are usually defined in terms 
of success or failure. Many complex systems like transit systems or electric 
power systems have, however, several levels of failure and it is. therefore, 
appropriate to define the calculated reliability measures in terms of subset 
X* which may contain a specific number of system slates. This subset 
defines an event or a particular mode of degradation of the system. The 
various modes or levels of system degradation can, therefore, be repre- 
sented by suitably defining the elements of X* . As an example X may be 
used to represent the system states having ihe number of failed passenger 
stations greater than a particular number. The following measures defined 
on X + have been used in this chapter. 

1 Probability of X + . This can be defined as the limiting value of the time 
spent in X + as a Traction of the total operating time and is given by 

f + - 2 ^ < l0 - 4 > 

2. Frequency of encountering X\ This is the mean number of occurrences 
of X+ per unit of the operating time and can be calculated by 

3. Mean cycle time of This is the mean time between successive 
encounters of A" 1 " and equals the reciprocal of / + . that is. 



(10.6) 
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4. Mean duration of X + . This is the expected lime of stay in X + in one 
cycle (one cycle constitutes A" 1 " and X ~) and is given by 




= P+T+ (10.7) 

Demand Based Measures [13]- The primary purpose of a transit system is 
to move passengers between the various points of a network. It is. there- 
fore, important to have a measure of reliability as perceived by the 
passengers. As an example [7] consider a jeep having an MTflF of 
approximately 260 hours [5]. If this vehicle were driven on the average for 
30 miles/day at an average speed of 15 miles/hour the average interval 
between two failures would be 130 calendar days, whereas this interval in 
terms of operating time would be only about 1 ] days. Coming back to 
transit systems, consider a system that breaks down on an average of every 
15 calendar days with the average down lime duration of half an hour. A 
passenger who travels for only, say, 15 minutes twice a day will not be 
affected by every system failure. Assuming uniform service level of 16 
hours/day operation, the passenger may be affected on the average 
approximately by every seventh failure and therefore will tend to see the 
system failing on the average approximately three times a year. The 
perception of the failure is further affected by several factors such as 
whether the delay has to be tolerated in a comfortable, air conditioned 
environment or in a hot stuffy vehicle and ihe personal temperament of 
the passenger. Media reports and the stories of system failures told by 
other passengers add something to the direct exposure to failures. It can be 
appreciated that it is extremely difficult to measure or predict the pas- 
scngers perception of the failures. Nevertheless suitable measures related to 
the impact of system failures on the passengers can be devised. 

The ability of a transit system to continue providing transportation 
services as scheduled or advertised may be termed as the operational 
reliability. In this definition, it is assumed that schedules are set within 
normal capabilities of the system. If the transit system cannot provide 
scheduled service when every subsystem is working normally, it is a 
problem of planning scheduling, or operations management. A failure in a 
subsystem, however, causes u perturbation that may affect the system's 
ability to provide adequate service. The impact of failures on the schedule- 
keeping ability of a transit system is the concern of operational reliability. 
The demand based measures of reliability are, therefore, also the measures 
of operational reliability- Some measures arc described here. 

DELAY, THE BASIC MEASURE OF OPERATIONAL RELIABILITY [II]. $0 Jong 3S B. 

passenger can get from one station to another in a comfortable, safe, and 
timely manner, a failure occurring in a system does not bother him. A 
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^or delay may be hardly noticed, but if the passengers suffer long , de ay 
Z if Ihe delay is too frequent, the transit system would appear unreliable 
t the passengers. The failure-induced delay is, therefore, a 
leisure of operational reliability. The delay may be tncurred at the 
following points. 

Departure points. When a passenger arrives at a departure point the 
EES may be inoperative cr operative in a degraded mode. Th.s adds to 
the waiting lime, making the total travel time longer. 

Delay during trtwL The passengers on board the vehicles nmy suffer 
Say due to the breakdown of a subsystem. The delay can be broadly 
classified into three categories". 

1. Minor delay <x units of time. 
2 Major delay >x units of time. 

3! Entrapment, a major delay requiring passenger evacuation. 

-mere does not appear to be enough relevant data on the passenger 
fcZSl delayed this appears to be a useful although diffrcuh area 
for investigation. Despite extensive investigations, some judgment will 
always be involved in fixing the value of . Some basic ****** 
measures of operational reliability in terms of delay are (M below 

pi 

1 P ( x x \ that is. the probability that a passenger will encounter a 
dly">*; and <* a . on' a trip. The length of the delay is defmed by 
the interval The interval (0,x) means a delay less than x, 

whereas (*,oo) means delay longer than jc. 

2. MTBD, the mean time between two successive delays suffered by a 
passenger. 

3. Expected value of delay. 

UsinR the frequency concept of probability, x 2 ) can also be inter- 

preted as the limiting value of the proportion of the trips having a delay 
L x,) to the total number of trips. Suppose that a person makes a large 
nlber of trips of varying lengths. If the trips on which ,™™f ' a 
delay of say greater than 5 mmutes were counted and then dwided by the 
to JnumbeAf trips, it would approximate Pf ,co). The P-b^ can 
be further converted into MTBD by knowtng the number of rvps in a year. 
It will also be desirable to compute the expected value of delay. Data 
however, may not be available to compute these measures. The calculation 
of these measures could be simplified by relating the delay to the vehicles 
rather than the passengers. In such a case the ratio of veh.de delayed tnps 
to the total trips would be calculated. 
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Many measures of operational reliability can be defined [2- 4]. These 
measures should reflect the delay incurred or travel time tost by the 
passengers or vehicles. The real difficulties lie, however, not in defining 
measures of operational reliability but in developing suitable analysis 
techniques and obtaining valid data for calculating these measures. 

loce OR LOCP. As noted earlier it is not difficult to define more sophisti- 
cated measures of operational reliability. The harder part is the data and 
subsequent synthesis of this data to calculate the measure. Any measure 
may prove to be satisfactory so long as it reasonably reflects the ability of 
the system to provide adequate transportation service. One simpler mea- 
sure may be called "Loss of Capacity Expectation" (LOCE) or "Loss of 
Capacity Probability" (LOCP) and can be defined as the probability that 
the system will not have enough capacity to meet the demand adequately. 
This can also be interpreted as the expected value of lime during which the 
system cannot meet the demand. This will include the periods of degraded 
operation, for example, not enough vehicles being available or some other 
subsystem failure causing deficient operation. The LOCE is computed as 

LOCE=2>,& (10.8) 

i 

where f, -probability of the system being in state i 

Qi = probability that the system will not be able to meet demand in 
state ; 

The LOCE gives the same weight to all system deficiencies irrespective 
of the magnitude of the impact and it is therefore likely to be a conserva- 
tive measure. This approach is identical to the use of LOLE (loss of load 
expectation) in power generation planning studies [1] by the electric power 
utilities. 
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I his section first describes the model for a single vehicle and then for the 
system of vehicles. The vehicle includes ihe body structure and all onboard 
equipment carried by the vehicle. A vehicle can have several modes of 
failure. For the sate of simplicity, however, each component of the vehicle 
is assumed to have two modes of failure. 

Retrieval or Total Failure Mode. With this type of failure, the vehicle is 
immobilized on the track and it cannot move on its own. External 
assistance is normally required For clearing it from the track. This kind of 
failure is severe and causes serious delay to the passengers as not only the 
affected vehicle is stuck but the upstream vehicles also come to a halt. 
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partial Failure Mode. This type of failure ' 
Sormance but the vehicle xs not immobilized and can clear the track on 
perioral* y w down but do not interrupt the 

iB W S b dSX vSo e,ear 5 .he «* b s J*** In. . -he 
.Hi*™* yard, .he -or™! flow of «fflc b turned. 

10 3 J Vehicle Model 
The reliability model for a single track bound transit vehicle is shown m 
Figure 10.2, where the following notation is used: 

x ^ = the rclneV al and partial mode failure rates of the ith compo- 
nent, that is, 

X ^i/(mean time between retrieval mode failures of the rtn 
component) . 
X.j-l/fmean lime between partial mode failures of the rth 

component) 

a = retrieval rate = 1 /mean time to retrieve a vehicle 

J; -partial failure recovery rate=I/mcan tune to clear a partially 

failed vehicle . 
u = repair rate of the rth component = I /mean lime to repair the 

ilh component 




Figure ltL2 Vehicle mode!- 
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Kljjure lou Equivalent vehicle model. 



He normally operating state of the vehicle is denoted by "O." From this 
state, the vehicle can fail into a retrieval mode failure (denoted by RET) or 
a partial mode failure, denoted by P. The vehicle is either retrieved from 
the track or in the case of partial failure it clears the track on its own 
power and passes into the down state (denoted DN) in the maintenance 
area. The vehicle is then repaired back into its normally operating state O, 
It should be noted that the mean retrieval time is assumed to be the same 
for failures originating from different components. The same is true for the 
mean time to clear a partial mode failure. 

The vehicle states can be grouped using the concept of equivalent 
transition rate defined in (1 0.3). The equivalent model is shown in Figure 
1 0.3 where 



Equivalent stale Original states 

(Figure 10.3) (Figure 10.2) Description 



O O operating state 

R FT (11, 21,,.., /],...) retrieval mode failure 

P (12,22,...,(2.,..) partial mode failure 

DN (13,23,. ..,i3 ( .„) being repaired off track 



Strictly speaking, the merging of states {13,23,..., /3,...) is correct only if 
Ihe component repair rates are equal (see conditions of mergeabilily [10]). 
The error introduced because of nonequality of repair rates is, however, 
relatively small. The equivalent transition rates of Figure 10.3 can be 
calculated using (10.3). As an example, for calculating A,, the equivalent 
transition rate from the state O to slate RET, 

and 



A- + -(ll.2l /!,...) 
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^d therefore 



2*« m) 



Similarly for ,i. the equivalent transition rale from the state DN to the 
state O, 



and 



Therefore 



V =(13,23 i3....) 



2 PnUi 

J (10.10) 



u 



Now 



that is. 



(An+A.jijjo (10.11) 



Substituting P n from (10. 1 1) into (10.10) 



i 



1 



where ^J*/' 



(10.12) 



In a similar fashion it can be proved that the equivalent transition rates ^ 
and fi in Figure 10.3 have the same values as n r and n p m Figure lu.z. 
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The state space model of a single vehicle is described in section I0JJ and 
this section now describes the model for the system of vehicles, that is, all 
the passenger carrying vehicles in the system. The model is based on the 
following assumptions: 

L Only the operating vehicles are liable to fail and the vehicles on standby 
or maintenance are not subject to failure. This assumption results in 
assigning zero failure rales to the vehicles being maintained or lp 
standby mode. The failure rate of a cold standby can be generally 
assumed zero, [f the vehicles are in a warm standby mode, that is, 
partially powered, this assumption is valid only so long as the failure 
rate in warm standby mode is small as compared with that of the 
operating vehicle. 

2. The failure of a single vehicle in [he retrieval mode causes the whole 
system of vehicles to be down. The duration of the down time of a 
vehicle is considered from the time it comes to a halt to time of 
resuming normal operation. The down time of all the vehicles, the 
directly affected vehicle and the vehicles coming to a halt as a result of 
blocking, is assumed to be the same. This assumption was made because 
of short headways and a relatively small loop length. In a larger loop 
and longer headways, all the vehicles may not be equally affected and a 
correction to models may be needed. 

3. A vehicle in a partial failure mode is assumed to be removed from the 
system as soon as possible after the occurrence of the failure and 
therefore the probability of another unit failing during this period or the 
partial mode passing into full mode is assumed zero. 

4. So long as there is even one standby, a unit on which maintenance is 
completed will be interchanged with an operating or standby unit, 
When, however, no spares are left, the unit passes directly from mainte- 
nance into the operating mode, without going on stand-by. 

A section of the state transition diagram of the model for the system of 
vehicles is shown in Figure 10.4, where n, s, and m denote the number of 
operating, spare, and on-main tcnance (preventive) units, respectively. The 
bigger squares represent the operating states and the small squares denote 
the corresponding partial and retrieval mode stales. 1 hr. index in the top 
left corner of the operating slate is the stale number and the number of 
failed units is indicated in the lower left corner. The first column of stales 
has m units on maintenance and is called group nt in Figure [0.4. Lhe 
second column is called group (m-l)andhas(ffl— 1) units on mainte- 
nance and so on. Only two groups are shown in Figure 10.4. In the state 
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Transit System Reliability 
index i x , i denotes the state number and x denotes the affiliation 



o = operating slate 

r- retrieval slate 

p = for partially operating state 



For example, 2 0 indicates state 2 is in the operating mode and 2, and 2 
indicate the associated retrieval and partial modes. The relationships 
between the various slates can be better understood by tracing through a 
few of the states. Starting at the top, l„ represents the state when every unit 
is as it should be. In this state there are n operating units, s spare units to 
replace the failed units and preventive maintenance is being carried on m 
units and there is no failed unit. From state l o , the system could transit 
into stale \ r or I p by the failure of a unit in the retrieval and partial mode, 
respectively. In state 1,, the failed vehicle sits on the track and brings the 
system to a halt. After the failed vehicle is removed, the system enters state 
2„, there is one failed unit, and this failed unit has been replaced by a 
spare unit, reducing the number of spares by one. Similarly from state I , 
the system will transit to state 2 „ by the partial failure recovery. From state 
2 0 , the system could transit into state l a by the repair of the failed unit or 
it could transit into 2, or 2 f by the failure of another unit. This pattern of 
transitions continues unlil state (s+ l) a with s faded units and o units on 
stand-by is reached. In state ($ + !)„, in addition to the pattern of transi- 
tions discussed earlier, another mode of transition is introduced, that is, 
when maintenance on a unit is now completed, it is put in the standby 
mode (group m-\). Now consider state (j+2)„ in which (n- 1) units are 
operating, that is, one less than the required number. If maintenance is 
now completed on a unit, it is put into ihe operating mode and the system 
transits into (s + n + l) v . The rest of the states can be traced in a similar 
manner. 

10.3.3 Reduced Vehicle System Model 
The number of system states can be derived from Figure 10.4 as, 

NVS-${s- l) + (m + l)(3n+4) (10.13) 

For example for n-50, r = 2, and m-4, NVS= 11 '3 states. The number of 
states can be considerably reduced by merging /, and i with i 0 . The 
reduced model is shown in Figure 10.5 and its state / is equivalent to 
<' u t t„ i p ) of Figure 10.4, The number of states is given by 

RNVS=(s-\) + (m+])(n+2) (10,14) 

Now forn = 50, j=2, and m = 4, RNVS=26\ as compared with /VKS-773. 
For m units on maintenance, there arc ( m + 1) groups of states of Figure 
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uon diagram of vehicle system- 



10.5. The equivalent transition rates between various states of the Figure 
10.5 can be calculated using (10.3), 

Equivalent Transition Rates within a Particular Group. The transition rate 
from state / to state j + 1 is 

P, P 'J* P + K 'J*, (10.15) 
*«' +1 > P.. + P..+ P,. 



Now 



and 



W+*>- P io + P ip +P lf 



P ^Bhl (10.16) 



P =9ihl (10.17) 
" ft. 
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where O t is the number of operating units in state j\ Substituting for P 
and P ip from (10.16) and (10.17) into {10.15), 

A ,,, + ,;=- ^ (10.18) 



where 



Similarly 



where 0, is the number of Failed units in the stale £ 



Equivalent Transition Rates from a Particular Group to the Next Group (say j 
toj+ 1). The equivalent transition rate from state i to i+n + I is given by 



where w ( = number of units on maintenance in slate i 

7]„=mcan maintenance tfatie 
V;~0 for number of spares in state / >0 
— I otherwise 



(10.20) 



The Equivalent Transition Rates From a Particular Group to the Previous 
Group, Shown in Figure 10.5. 

10.3.4 Solution for Slate Probabilities and Reliability Measures 

State Equations. The steady-state equations for the equivalent model 
(Figure 10.5) can be written using (10.2). In the matrix notation this can be 
written in the form, 

AP=0 (10.21) 
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wh ere A=zn NxH matrix such that its ijth term represents the tmnsi- 
tun rate from state j to state h N being the total number of 

statics 

P-a column vector whose /th term is P„ that is, the probability or 

being in state i 
0=is a column vector having all elements as zero 

The N equations of (10-21) are linearly dependent, that is^any equation 
^ be obtained from the remaining {N-\) equations. Therefore any 
ftf- I) equations of (10.21) together with the total probably equation 
(10.22) can be solved to obtain P 

f 

In the matrix form any row of A and the corresponding element m O are 
changed to 1.0 before solution. The linear equations can be solved using 
numerical methods like Gauss elimination. Once the probabilities of the 
equivalent states have been determined; the probabuities or the original 
states can be calculated using the following equations: 

P -fl (10,23) 
_ O t K p , g (io,24) 

and 

O^iPio (10.25) 
* f*, 

Vehicle Exposure Factor, The vehicle system model described in Section 
ml is 1<Z for the on-line stations. When, however, the stanons are 
oil ime. the retrieval mode Tailure of a vehicle within the statton limits 
docs not immobilize the rest of the vehicles since they can bypass on the 
express lane. The retrieval state f, can. therefore, be decomposed mto : , 
and i , representing failure within the station limit and on-line, respec- 
tively The probabilities for ihese states can be computed as follows: 

< 1026) 

and 

P lti -P ir - { l-E) (10.27) 
where E is the exposure factor, defined as the limiting ratio of the unit 
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failures within the station limits to the total failures and is assumed 
approximately equal to the ratio of the length of the guideway within 
station limits to the total length or the guideway. 

The vehicle failures within the station limits will, however, shut down the 
station lane and are. therefore, considered a part of the station lane 
failures. The equivalent failure rate component to be added to the station 
lane failure rate is calculated by, 

Ks=~ (10.28) 

i 

where N ST is the number of passenger stations. 

Vehicle system Reliability Measures, Once the state probabilities have 
been calculated, the event (subset of states) probabilities, frequencies, and 
other measures can be calculated using (10.4)- (10.7), There are many ways 
of defining the events and two of them are described below. 

exact Measures . These measures calculate the probabilities and frequen- 
cies of encountering states in which the operating vehicles are equal to a 
particular number and can be designated as P(N a mn) and f(N a -a) where 
N a denotes the operating vehicles. The probability of N a ~ji can be simply 
calculated by adding the probabilities of all states having n operating 
vehicles. The frequency and other measures can be calculated using 
(10.5)-(10.7). 

cumulative state MEASURES. Another way to represent these measures is 
to calculate the probabilities and frequencies of encountering states in 
which there arc fewer than a particular number of operating vehicles. The 
equations for these measures are given below, 

*(^«*> 2 ( W, +/»„)+ 2 /> frl 00.29) 

and 

/( N B < n ) - P ktr - O k X , E+ P kpf t p + 2 P,^ A , (1 - E) ( 10.30) 

i 

where stale k o is such that for (k e + 1), N a *=n. 

10.3.5 Vehicle System Example 

The models described in this chapter have been implemented in a com- 
puter program [12], Starting with the component data, the program gener- 
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ales the transition rate matrices for the subsystem and system models and 
then solves these matrix equations to provide with system-based reliability 

'^The'vehicle system data for this example is printed out in Table 10,1 A. 
The system consists of 14 vehicles out of which 1 are on maintenance and 
two are kept as spares. The assumed failure rate and other data are also 
listed in Table 1 0,1 A. The probabilities of being in various states (see 
Figure 10.4) are printed in Table 10- IB. The state description on the 
riRht-hand side pertains to io, that is, the operating state. Column P 
contains probabilities of the operating states io and the associated columns 
PR PS and PP give the probabilities of ir\ frl, and ip. It can be seen that 
the most significant probability values are for states (10.2,2), (10.1,2) and 
the operating states of (10,1,1) and (10,1.0). The slate probab.lit.es are 
grouped as a function of the exact number of vehicles in Table 10. 1C. The 
second column, "PROB OF OPTG," gives the probability of being in the 
operating state with number of vehicles ind.cated in the first column. The 
third column "FREQ OF OPTO" can be interpreted in cither of the two 
ways (a) the number uf times the system transits out or the state in a day, 
or (b) the number of times per day the state is entered by the system. The 
fourth and fifth columns indicate the probabilities and frequences of 
encountering the partial operating states. The probability and frequency of 
the system being in the retrieval state are given at the bottom. The 
reliability measures arranged in the cumulative form in Table 10.1 D, where 
the reliability measures for N B <9 and downward all are approximately 



Table 10.1A Vehicle system data 



MEAN TIME TO VEHICLE F A! LURE l*>ER MANENT I - SOOO.0000 HOURS 

MEAN TIME TO VEHICLE FA ILWE (PART IAL MODE 1 = 500.0000 HOURS 

MEAN TIME TO VEHICLE RETRIEVAL ■ 0.5000 HOURS 

MEAN TIME TO CLEAR PARTIAL MOOE VEHICLE = 0.2500 HOURS 

MEAN TIME TO VEHICLE MAINTENANCE = 3.0000 HOURS 

MEAN TIME TO VEHICLE REP A I H = 2.S000 HOURS 



NUMBER OF OPERATING VEHICLES = 10 
NUMBER OF SPARE VEHICLES - Z 
NUMBER OF VEHICLES ON MAINTENANCE * Z 
VEHICLE EXPOSURE FACTOR ■ D.3O00 



Table 10. IB Vehicle system state probabilities 
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Table 10.1C Exact state probabilities and frequencies of the vehicle system 
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Table 10, 1 D Cumulative .stare probabilities and frequencies of the vehicle system 
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System Model for Trains 

Table 10*2A Vehicle system data 
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equal This is because of the presence of 2 snares and 2 units on mainte- 
nance which also can behave as spares, the probabilities of stales with 
operating unils less than 10 are relatively low and therefore are dominated 
by the retrieval state probability. 

The results for another example which b the same as the previous one, 
except that there are no units spare or on maintenance, are given in Tables 
10 2A-D In Table 10.2D, the reliability measures tend to be almost equal 
rrom N <7 downward as compared with N»<9 in Table 10. ID. The 
probably and frequency of encountering slates with a specked number 
of failures, depends upon ihc failure rate of units, number of sparer and 
number or units on maintenance. These relationships will become clearer 
in the section on sensitivity studies. 

10.4 SYSTEM MODEL FOR TRAINS 

When the trains are regarded strictly as single unite for operation spares, 
and maintenance, the vehicle system model described in Section 0.3 can 
be u«d A model, however, is also possible based on a more flexible policy 
for train formation. The general procedure is the same as for the vehicle 
system: 

1. Generate the possibility space by enumerating and describing the possi- 
ble system states. 

2. Develop ihe transition rate matrix. 

y Solve for probabilities and calculate reliability measures. 



Table 10 JB Vehicle system stale probabilities 
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Table 10. 2C £.rac/ j-?*/fe probabilities and frequencies of the vehicle system 
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10 

g 

B 
7 

6 

a 

A 
3 

I 

0 



0.94101 IE no 
0.520380 C-0 i 
0. 1 294 B3fi**0£ 
0. I 90947C-04 
0. lft47?3E-06 
0. 122601^-0 9 
0.56490ft^-l 1 
0. I 7B4 71 E- 1 3 

a. 362 7iaE-ie 

0 . 2Qfl 34H~-2 a 

0. oaoo no-' 0 0 



a,4969'>5E oo 
0.521449E 00 
0. 351 354E-01 
1. 5>aoi 9*S~03 
O. 70 >1 O9F.-05 

n, 57fe0l 4E-07 

0.31623QE-0 r J 
O. 1 1 573*P-1 I 
0 . 26 7052C-1* 
O.OOOOOOE 00 

o.moanJF oo 



0.4 70506E-O2 
0.2 3290 IE- 03 
0 .51 2381E-05 
0.5575S6E-O7 
0 .542484E-09 
0.29a3&6£-l 1 
0* 109*0 IE- 13 
0 .257B61E-1 6 
0.347246E-I 9 
O.OOOOOOE 00 
O.OOOOOOE OO 



D,4516S6E 00 

0 .491BH6E-Q3 
0.63l£5*E-05 
0 p S207SSE-QT 
0 . 2S&A3 1E-09 
0. L0502SE-U 
0.247546E-1* 
0.333357E-17 
O.OOOOOOE 00 
O.OOOOOOE 00 



RET STATE pmjAilILITV= 0 .69 1 7 7 2F-0 ? 

RET STATE FRROUENCT( PER OAYI= 0. 3721 0 I 



Table 10 .2D Cumulative stale probabilities am! frequencies of the vehicle system 



OPERATING VEHICLES EQUAL TO PRO13 A 01 L IT Y FREQUENCY CYCLE TIME MEAN OUR AT ION 



OR LESS THAN 




PER OAY 


DAYS 






10 


0. 1 OOtflOE 01 


' D00 = 00 


0. 0O0O0OE 


oo 


0.0000Q0E 00 




0.542H>3^-01 


i.i^fiasse oo 


0 . 20 1 266E 


01 


0.109252E 00 


9 


1. 201 1 A-4E-02 


1. ">62 123E-01 


0. 177B97E 


02 


0. 357S29 E-Ol 


7 


0.71! Ihx^ -o 1 


0 .3372«2«-01 


0. 296525S 


02 


0.21 09* 3E-01 




0. Ci^a^ 3C-03 


n, 33? 245E-01 


0 . 3009B3E 


02 


0.20B349 E-01 


0.ft , i3o*2E-n3 


0.332 1 30E-01 


0.3Q1 041 E 


02 


0 .20S3 3 3E-01 


4 


n .692041= -03 


0. 3321RO«E-01 


0. 30lO*2E 


02 


0 . 20833 3E-01 


3 




1.33 2 ISO E-01 


0. 301042E 


02 


0.208333E-0 1 
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The state space is generated by a subroutine in the computer program [12] 
according to the following rules for the train formation. 

1. When a vehicle in the train fails, the train is removed from service and a 
spare train, if available, is put into operation as a replacement. If. 
however, a complete spare train is not available, the defective vehicle is 
replaced by a good vehicle and the train is put back into operation. If 
no good vehicle is available, the defective vehicle is removed and the 
train put back into operation. There is no additional difficulty in 
modeling with married pairs as they can be treated as single units. 

2. Train-consists of different lengths are allowed. 

3. In case of no available spares and when all trains are not full length, a 
vehicle on which maintenance or repair is completed is attached to the 
train having the least number of vehicles. 

In the rules outlined above, the attempt is to keep the maximum vehicle 
system capacity in the operating condition. Rules 1-3 represent only one 
policy and models can be similarly developed for other policies. The results 
of the solution of the train model are grouped in terms of vehicle system 
capacity, that is, the output format is the same as for the vehicle system 
model. 



10.5 PASSENGER STATION MODEL 

The stations are assumed off-line and basically or two types; 

l> Stations having one station lane and an express lane for through traffic. 

These are termed type A stations. 
2. Stations having two station lanes and an expressway lane, called type B 

stations. 

10. 5. 1 Model for a Single Station 

The state transition diagrams for the type A and type B stations are shown 
in Figures 10.6 and 10.7 respectively, using the following notation: 

£,■' = normal station operation, that is, both the station lane and the 

express lane are working 
D = station lane down 

L= failure of the station lane as well as the express lane, complete 

failure of type A station. 
L = station lane working but express lane down 
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Figure 10.6 Staw transition, Jjagram of type 
A statinn 



For the type B station i 



2£>=bolh the station lanes down 

Lj —one station lane working and the express lane down 
Lj— both station fanes working with express lane down 
A,, ft, = failure and repair rates of the station lane including the contribu- 
tion h ys OI lnc vehicle system failure 
f*r — express lane failure and repair rates 

The impact of the various station stales on the system can be tabulated in 
Tables 10.3 and 10.4. 
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Figure 10.7 Scale transition diagram of type 
B station. 
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Tabic 10-3 


Type A Station 




Station Stale 


Impact on System 


Impji^t t>n ?>i.l[iup 


V 


T _ _ JlT' ,.L .inn irS-ClT 

1 mint t-an paaa 


Passengers can 






embark /disembark 


D 


Traffic can 03.SS 


Passengers cannot 






embark/ disembark 


I 


Traffic cannot 


Passengers cannot 




pass 


embark/disembark 


L 


Traffic can pass 


Passengers, can 






embark /disembark 



f 0, 5. 2 Model for System of Stations 
A transit system may have *0 number of type A ^stations and M""*" 
of vne B stations. The model for the system can be bull by combining the 
S for the individual stations. It is assumed that when at a station 
bmhte station lanes and expressway are down and the trafftc cannot pas, 
through, no further failure of stations lakes place. 

The model for the system of stations is built by sequential add-on U>L 
Th s is achieved by adding one station at a bme, solving for tate 
polities, deleting low probability slates, and ■ 
Thi, procedure helps to keep the system slates within manageable limits. 
S ofse^tntial addition is illustrated in Table 10.5 for three type 
A stations with ihe following hypothetical data: 



mean up time of the station lane - 800 hours 
mean down lime of the station lane =* 2 hours 
mean up lime of the express lane - 1000 hours 
mean down lime of ihe express lane - 2 hours 



Table 10,4 


Tvpe B Station 




Station State 


Impact on System 


Impact on Station 
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Traffic can pass 


Passengers can 






embark /disembark 


D 


Traffic can pass 


Passengers can 






embark / disembark 


ID 


Traffic can pass 


Passengers cannot 






embark/ disembark 


L 


Traffic cannot 


Passengers cannot 




pass 


embark/ disembark 


£| 


Traffic can pass 


Passengers can 






em bark/ disembark 




Traffic can pass 


Passengers can 






embark/ disembark 



Tabic 10.5 Sequential Model Building 
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(rf) Truncation of nates with probabilities less than 10 


J 








1 




2 


0 


0 


n 


n QQins i 

U.Tl'lUJ 1 




2 




I 


1 


0 


u 


n do^}^ v tn 


-i 


3 




I 


0 


J 


0 


U . 7 7. 1 JH. 1 A. | \J 


■■ 


4 




1 


Q 


0 


I 
i 


U r JjV^rJrU 1 X 


-1 


5 




u 


I 


0 


■ i 


n a i fiWM v i n 


- 5 


6 




0 


i 


0 


[ 


ft y in 


-J 


7 




0 


0 


(1 






-S 


(e> Addition of the third nation 
















I(1.D 




3 


n 


(1 








2(2.1) 




2 


1 


0 


(i 






3(3,1) 




2 


0 


1 


u 






4(4,1) 




2 


0 


0 









292 



TABLE 10.S (Continued) 



System 
State 



Identical 

states 



No. ol stations 
in stale 



U 



Probability 



5(5. 1) 

«6,D 

7(7,1) 

8(1,2) 

9(2,2) 

10(3.2) 

IK4.2) 

12(5,2) 

13(6,2) 

14(7,2) 

15(1.3) 

16(2.3) 

17(4,3) 

18(5,3) 

19(6.3) 

20(7,3) 

21(1.4) 

22(2,4) 

23(3,4) 

24(4,4) 

25(5,4) 

26(6.4) 



27(7,4) 




Merging of identical 


1 


1 


2 
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3,15 


4 


4,21 


5 


5,9 
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6,11.22 
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3 


10, 16 
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(3,3) not possible 
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0.7399MX10" 2 
0.14W35K10 -4 
0.591964X I0" s 
0 1 84865 x 10-* 
0.29576OX10 -4 
0.118294X10 -4 
0-49350€xlO- T 
0.153901 xl0 _T 
0.369310X10" 1 
0-295407 x 10 " 7 
0.394773 X 10 _T 
0.461670 Xl0 - 10 
0.738569 x 10"'° 
0.295387 x 10 - ,0 
0,787643 X 10 ~ s 
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Table 10.6 Model of Thru? S unions Without Truncation 



System 


Stale No 


No. of stations 


in 












Slate 


as in Table 10. 1 


TT 
L 


/J 


1. 




PlYlhjl Kil i 1 u 
1 1 U UiJ t J|J t\J 


I 


1 


3 


0 


Q 


0 


0. 986606 


2 


2 


2 




0 


0 


0.739954X 10 " J 


3 


3 


2 


D 


1 


0 


0. 148435 x 10 "* 


■1 


4 


2 


0 


0 


11 


0.591964x10 " 2 


s 


5 


1 


2 


0 


II 


0. 184865X10"* 


6 


s 


I 


1 




0 


0.493508* 10 ~' 


7 


6 


1 


1 


o 


1 


0.295760x10-* 


a 


12 


1 


!) 


1 


1 


0.394775XI0" 7 


9 


7 


I 


0 


0 


2 
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0.295407 x 10" 7 


15 


Deleted 
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0.123121x10 i: 
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1 
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The model for a single station is represented in Table 10.5a. The addition 
of one more station is shown in Table 10.56. For each state of the station 
being added, there is a set of system states of Tabic 10.5a except the state 
(3, J), The state (3,3) representing two stations completely down (state Ljj 
is assumed not possible since exposure to failure is assumed zero as soon as 
one station is completely down. The system states in Table 10.56 are 
numbered in the serial order and the numbers in parenthesis indicate the 
combination, the first number denoting the system state before addition 
and the second indicating the state of the station being added. The 
identical states can now be grouped together as shown in Table 10.5c. The 
states with probabilities less than 10 " s (an arbitrary reference) are deleted 
and the remaining slates are shown in Table 10.5«", where the state 
numbers are serial and have no relationship to state numbers in Table 
10.5c. Tables I0.5e and 10.5/ shim- ihe addition to the third station. If a 
fourth station is now to be added, then the states with probabilities Jess 
than 10 _s can again be deleted and the procedure repeated. The exact 
results, without any truncation, are shown in Table 10.6 and are almost 
identical to Table 10.5/. In general, the effect on the results depends on the 
reference probability value employed for truncation. 



fSottekfur Other Subsystems 

,0.6 MODELS FOR OTHER SUBSYSTEMS 



IQ.6. 1 Po wer Substations M odel 
A single power station is assumed to have only two slates, up (i.e., the 
fub Son is working) and down (Le.. the substation is failed, he system 
2 substations is, however, assumed as an m/n configuration; that is, he 
1 em is good if m out of rt substations are working. In other words the 
Sumption means thai so long as m out of « substations are working, 
Se power supply is adequate to keep the system running. When, however 
one more substation fails, the system either completely fads or goes mto 
severe degradation. The stale transition diagram for the substations model 
J shown in Figure 10.8, where \„ and ^ are the substation failure and 
repair rales respectively. 

10.6.2 Guide-nay Model 
Guidcway consists of the structure, power rails, and any olhcr equipment 
whose failure would incapacitate the guideway. As an example in magnetic 
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Figure lu.8 Stale transition digram For power subsUlions model. 
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levitalion and linear induction propulsion, the guideway would include 
suspension armature rail and L1M rail. The guideway is assumed to be a 
two-stale system: Up state when the guideway is in satisfactory condition 
and down state when a failure of guideway interrupts the flow of traffic, 

10.6.3 Command ami Control System Model 

Command and control includes all equipment associated with the control 
of vehicle movement, except the vehicle-borne equipment, which is re- 
garded as a pari of the vehicle. Like the guideway, the command and 
control is also assumed a two-state system. 



10.7 TOTAL SYSTEM MODEL 

The models for various subsystems have been described in Sections 1 0.3— 
10.6. These models can be combined to give the reliability measures of the 
entire transit system. 

10. 7.1 System-Based Reliability Measures 

The various subsystem models c:in he combined b\ the sequential model 
building described in Section 10.5.2, The subsystem models, for this 
purpose, are reduced to the following equivalent forms: 

L Vehicle / train system mode!. The vehicle or train system model is repre- 
sented by a two-state equivalent such lhat State l-{N 0 >n) and State 
l-iNtyKn). The equivalent transition rates are 

X - ^ Ar " <n ) 
12 P(N a >n) 

and 

2. Station system model. The station system model is reduced to a multi- 
slate model whose states represent the possible combination of stations 
in the working state, for example, for a system of 3 type A and I type B 
stations, there could be (3, I), (2. 1). (0.0) states, where the numbers in 
the parenthesis indicate the number of type A and type B stations in the 
up state. The equivalent transition rates between the various states can 
be determined using (10.3). 



7b(uf System Mode! 



2*7 



J, Substation Model. 

State I - (number of up stations >nt) 
State 2= (number of up stations <m) 
The equivalent transition rales are 

*ij -m)m- 



c 



P(N„>m) 

_ P\ S X[ = m- \ ){n-m + l)a,,. T 
111 " P{N„<m) 

where N lt is the number of working substations. 

The station system model, the substation system model, the command and 
_ontrol model and the guideway model are combined together and the 
resulting slates are grouped as (N A =i.N 3 -j) where N A and H, indicate 
the number of type A and type B stations up, respectively. The state 
jV -0.^=0 includes the condition of having the substation system, 
guideway or command and control down. Therefore, ( N A =0. N B =0) does 
not necessarily mean that all Ihe stations are failed. It really means that ihe 
stations arc not available for embarcalion because the system is not 
moving- This combined model is then combined with the vehicle system 
model to give the measure for the entire system. 

The printout of the reliability measures of the system is shown in Tables 
10.11- 10.IL, The data for the vehicle system are shown in Table I0.1A 
and for other subsystems in Tables 10.1E-I0.1H. 

10.7.2 Including Demand [13] 
The calculation of probabilities, frequencies, and the mean duration of the 
various deficient stales of the system is illustrated in Section 10.7.1. These 
system deficiencies can be further related to the delays that they cause to 
the vehicles or passengers and then the probabilities of these delays can be 

table 10 . 1 E Passenger stations data 



NUMBER OF TYPS * STATIONS = 3 

NUMBER OF TYPE 9 STATIONS = I 

MEAN TIME TO FAILURE OF THE STATION = 3260-68 HOURS 

MEAN TIME TO REP A I R OF THE STATION = 1.97 HOURS 

MEAN T I ME TO FAILURE OF THF EXPRESSWAY =A666<.6.10 HOURS 

MEAN TIM? TO RSPAIM OF T HF ^PBCSSWW - 10. 00 HOURS 



Table 1 0. 1 F Power subsiatians data and results 



NUMBER OF SUB STATIONS - 4 

MINIMUM NU»BER OF SUB STATIONS HEOUIREO FOB OPERATION ■ 3 

MEAN TIME TO SUA STATION FAILURE ■ 2000 0.90 HOURS 

MEAN T I HE Tn -SUB STATION REPAIR = 0.50 HOUHS 



SUA STATION SYSTEM AVAILABILITY = 0.1 0000 DD 01 

FREQUENCY OF ENCOUNTEfllNa THE DO UN STATE * 0-B9901OD-U PER a 
CYCLE TIME TO ENCOUNTER DO UN STATE = 0 . It 1 I 2?D 12 DAYS 
MEAN DURATION OF DO* N STATE - 0 .692671 O-OS OAYS 



I able 10JG Comma mi and conrrol system data and results 



MEAN TIME TO COMMAND AND CONTROL FAILURE * lf.fti.6 7 HOURS 
NEAN TIME TO COMMAND AND CONTROL RCPA 13 = 2,5fl HULJ*5 



COMMAND AND C3NTRDL AVAILABILITY = 0.qgB*5*E 00 

FREQUENCY OF ENCOUNTERING OOHN STATE = 0.143777E-01 PER DAY 
CYCLE TIME TO ENCOUNTER DOWN STATE = 0.69552JE f> ? OAYS 
MEAN DURATION QF DOWN STATE * O.10 73O0E 0 0 DAYS 



Table 10. 1H Guideway data and results 

MEAN TIME TO OUIQE WAY FAILURE = 50000. 00 HOURS 
MEAN TIME TO GUIDE WAY REPAIR -= 10.00 HOURS 



GUIDE *AY AVAILABILITY = a.99430QE 00 

FREQUENCY OF E^OUNTERING DOWN STATE = O . A 79 90 4 E— 03 PER DAY 
CYCLE TIME TO ENCOUNTER OF DOWN STATE - 0.30«37 5 E 04 OAYS 
MEAN DURATION OF DOWN STATE = 0.41644TE 00 DAYS 
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Table 10.11 System reliability measures 



• OF TYPE A STAT1UW5 #DF TYPE H STATIONS * OF VEHICLES &^-«.T-:fl 

vjd UP TK*« 

3 1 * 

1 1 £ 

a ° « 



■ OF TYPE * ST*Tt>IS * OF TYPE B STATIONS • OF VEHICLES t*J*L 



UP 3« LE« TMAM 

4 



«0U ABILITY 



CYCLE TI ME 



hea« 



JFfl DAY DAYS D-JRATION 

O.«5T»0/l OO 0 .70*. P'SE-O* O.I*l6SftE DZ 14,108 

0.1 9OJ2TO-0Z J.ZZ0M5E-O) 0.M2SZTE <W 9. OSS 

O.10B743D-04 O.ZO«3SF-0« 0.JTWQ7E OS 0,0*1 

». I7t»uo-0z D.i*9ll*e-ai o,4tm;be oz o.nr 



probability 



FPEOD*MCY 



I. VI l • T | U = 



d. rooaa&a-os 

0 . I Z S3 BSD- 05 
0.TT4JOZO-O9 

o.iz»l(j»t)-os 



aCH DAY 

0 .337R39E-01 

a.76H3ise-o« 

0 , S57ft I T.E-07 
0 .&9TZORE-0* 



DAYS DURATION 

o.z«9iie oz a.ozi 

.j . i ' : i •• 05 O.OJT 

0.IT9335E OH 0,01* 

a.i*iAz«F as a.oift 



§ Table lffl.U System reliability measures 



« OP frPE A STATrU"J5 
UP 

a 

t 
i 
o 



•OF fTP£ B STATIONS 
UP 



OF VEHICLES GOEATEH 
THAN 

9 

M 

a 

u 



PROBABILITY 



0.94575^0 00 
O- I SO3323-02 
Oil BHT*>1 0- 95 
Q. I 744(60-02 



frequency 

PER DAT 

j .70 atna e- oi 

D.2;o*39E-OI 

a .z«5S3*e-o* 

D .149I09C-OI 



CYCLE TIM" WE4W 

. DURATION 

D-I4265TE 02 14.205 

0.452A30E 02 Dp QB£ 
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calculated. The extent to which such an analysis can be carried out 
depends upon the information available on the flow of passengers. The 
calculation of delays can be illustrated by an example. Consider a system 
such that only vehicle fleet need be considered and the other subsystems 
can be ignored. The delay could be caused by the following types of 
system deficiencies, 

1, A vehicle could fail in the retrieval mode blocking the flow of vehicles. 
This condition will last till the failed vehicle has been removed and the 
system put back into normal operation. Let the probability of being in 
this state be denoted by P Rt ^ - 

2, A vehicle could fail in a partial mode, degrading the operation of the 
system until the vehicle finally clears the system. Let the probability of 
being in this state be denoted by P^. 

3, The delay could also be caused because the number of vehicles availa- 
ble for service is less than the required number. This condition can 
result when spare vehicles are not available to replace the failed ones. 

These probabilities can be calculated using the models and methods 
described in this report. These probabilities can be then weighted with the 
probabilities of delay caused by these conditions. This will yield the 
probability nF delay caused by system deficiencies. The calculation of 
probabilities of delay by the deficient conditions is not covered in this 
chapter. 



10.8 SYSTEM STUDIES 

Some vehicle system sensitivity studies using these models are reported 
here. The system is assumed to consist of 50 operating vehicles and the 
relevant data is listed on appropriate Figures. The vehicle system state with 
number of vehicles less than or equal to 49 is considered as the reference 

IQ.&.l Sensitivity of Reliability lattices to the Vehicle MTBF 
(Retrieval Made) 

The effect of variation in vehicle MTBF on the probability, mean time to 
encounter and the mean duration of the system state with vehicles <49 is 
shown in Figures 10.9-10.1 L The reliability indices are plotted for three 
cases, (j=2, m-2), (j=Q, m-2) and (j- 0,^-0). The probability of 
N < 49 is the lowest and the most sensitive in the case of j=2. hi = 2. This 
is because when there arc no spare vehicles, the vehicle failure rate in 
partial mode begins to be effective and since it is 500 hours as compared 
with 5000 hours for the retrieval mode, the partial failure mode dominates 
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for j=0. The mean Lime lo encounter N ti <49 is again highest for x — 2, 
m»2 and shows the mosl sensitivity to variation in retrieval mode vehicle 
MTBF. This is because for 0, the partial mode of failure dominates. 
The mean duration of N B <49 is insensitive to the variation in vehicle 
MTBF since with spare vehicles available to replace the failed ones, this 
index is more or less determined by the mean time lo retrieve. 

Id H. 2 Sensitivity of Reliability Indices to the Mean Time to 
Repair a Vehicle 

The effect of variation in vehicle MTTR on the reliability indices of the 
system is shown in Figures 10.12-10.14. The probability and the mean 
duration of N 6 <49 decrease with the increase in the MTTR and the mean 
time to encounter N B < 49 correspondingly increases. For j=0 t the 
mean time to encounter N 0 <49 is relatively insensitive to the MTTR. This 
is because in this case, the system behaves more or less like a series system 
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and the mean lime to encounter .V, 49 i>> controlled by the vehicle MTBF 
(both partial and retrieval mode). The sensitivity of the indices to the 
vehicle MTTR also depends on the ratio of the spare vehicles to the 
Operating vehicles. The higher the ratio, the less sensitive the indices ure to 
MTTR [6] because with the spare vehicles available to replace the failed 
ones, the retrieval time dominates the time to repair. 



Jft&J Effect of Spare Vehicles 

Figure 10,15 shows the effect of the number of spares on the reliability 
indices. As expected the system reliability improves by having spare 
vehicles. After a certain number of spares, the effect is, however, incremen- 
tal small, and this number may be termed as the "infinite spare capacity" 
for the system, as there ts little improvement beyond this point. 
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10.8.4 Effect of Entraining Vehicles 

A study on ihe effect of entraining is given in reference 6. The process of 
entraining vehicles modifies the modes of failure by converting some 
retrieval mode failures into partial modes and introducing some additional 
elements for failure, for example, couplers and train! ines. 



10.9 concluding; REMARKS 



10.9*1 Effect of Peak and Off Peak Periods 



The state space reliability models for track bound transit systems have 
been described and the calculation of the system and demand based 
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reliability measures discussed. This work was originally carried out for 
application to a single loop configuration. The methodology can be applied 
to more complicated networks. 

Since these models were developed for application to a demonstration 
project, the effect of peak and offpeak periods was ignored. Therefore 
these models can find direct application for demonstration, airport or 
downtown people mover systems where the ratio of peak to off peak 
periods is closer to unity. 

There are two ways 1 13] of including the effect of peak and off peak 

periods: 

1, If it can be assumed that all the failed vehicles over the previous day 
have been repaired by the following morning so that the initial probabil- 
ity vector every morning is the same, the models can be solved in a 
time-specific manner over the peak and off peak periods. This will 
involve the solution of differential equations, instead of the linear 
equations for the slate probabilities. 

2. The models can be modified to include the effect of peak and off peak 
periods. The approach outlined previously in (1) is, however, simpler to 
implement. 

10.9. 2 Simulation Versus Analytical Method 

As the systems become more complex, the analytical techniques become 
more difficult to apply. Simulation using Monte-Carlo techniques can be 
used to perform the reliability analysis or track-bound systems. The 
simulation method is conceptually easy to apply but could be quite 
expensive for sensitivity studies. Sometimes it may be possible to apply a 
hybrid approach, that is, part solution by analytical methods and part by 
simulation- For example, the system base probabilities may be calculated 
by analytical models and the probabilities of delays by specific system 
deficiencies calculated by simulation and the two results combined to yield 
demand based measures. 

10.9.3 Failure Data 

Subsystem failure rale is an important inpul parameter Tor transit system 
reliability modeling and calculation. For transit systems using newer 
technologies, these figures are usually synthesized from part failure rates 
available from handbooks or other data collection and exchange programs. 
Although such information on conventional transit systems could be 
available from field experience, no collective effort at national or interna- 
tional level has been visible in this regard. A data collection and analysis 
activity to fulfill this need has been carried out and the results are reported 
in references 8 and 14. 
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19,9,4 Further Work 

The simulation techniques are conceptually simpler for calculating the 
reliability measures but consume considerable computer time, especially 
when performing sensitivity studies. The analytical methods become quite 
complicated when applied to complex system configurations but are very 
suitable for sensitivity analysis. There is a need for further development of 
the analytical and simulation methods for application to more complex 
network configurations, inking into account the peak and off peak periods, 
and including demand in deriving suitable measures of reliability, 
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Three-State Device Networks 



AH derivations in the appendix are based on the binomial expansion 
(P+q)". In the case of three-state device structures, the expansion is 
modified to (P+q„ + q t T* where P is the component probability of success, 
q 9 open failure probability t q 3 short failure probability, and n is the number 
of independent elements. 

A.I SERIES STRUCTURE 

A.I.I Reliability Expression 
Let n ' 2. Therefore 

C + ? 0 + <7,) 1 =' 

Thus 

P 2 + 2Pq, + 2Pq l + ql + 2q e q l +q}- I 

The number of slate combinations for (n = 2)-3 1 ^9. The stale combina- 
tion truth table may be represented as follows: 

Table Al. 

AW NO NS 

ON OO OS 

SN SO SS 



where A/ = the normal mode of the device (success) 
0 = the open mode failure state 
5 = the short mode failure state 

The reliability terms, by inspection of the truth table. Table Al, art 



PP+Pq,=Pq,=P 2 +2Pq, 
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(A.l) 
(A.2) 



Thus 

P> + 3P\ + 3P 2 q, + 3Pql + lPq?+6Pq <t q t + ql + ^q, + lqk 0 +q? - } 

Number of slate combinations for (n = 3) = 3 3 = 27. The new state com- 
binations truth table is as shown in Table A2. 



Table A2, 



NNN 


.sss 


OOO 


NOO 


soo 


OSS 


OON 


oos 


sso 


NON 


oso 


ONN 


ONO 


SOS 


NNO 


NSS 


SNN 


NSO 


S$N 


NNS 


NOS 


SNS 


NSN 


SNO 


SON 


OSN 


ONS 
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Since | _ 9j _ ?o? 

Now let n — 3: 
Therefore 



The reliability terms by inspection of Table A2 

PPP + Pq,q, + Pq a q 3 + Pq tqj + PPq i + PPq, + pp 0j 

therefore 

R-P' + yPqt + lpiq, (A.3) 

By using (A.l) and (A.3): 

« = (J-'?J" 1 -?; 1 (A.4) 

Obviously, from (A.2) and |A.4), the general equation For the reliability of 
the series structure can be written as follows. For identical components: 

* = 0 -?„)"-<?; (A.5) 



Series Structure *" 

and in the case of nonidentical components: 

tm t i- I 

A.L2 Probabilities of Failure 
At n = 2. From Table Al open mode failure terms 
q a P + Pq A + q 0 q Q + q,q a + q a q t 

Since 

P=l-q 0 -<i 1 (A?) 
Therefore the series system open mode failure probability 

k-l-O-fc? (A.8) 
and similarly for short-mode failure probability 

At n-3. According to Table A2, open mode failure terms 

■bPql -r-3P 2 q 0 + 6Pq 0 q, +^q, + lq}q a +ql I A. 10) 

By substituting for P in (ATO), open-mode failure probability 

0 < . = 1-(I-<7J' (A.ll) 

and in the same way for short-mode failure 

Q=q) (A.I2) 

In the case of identical components, according to (A.8) and (A, 11), the 
general form of open mode system failure 

e D =i-a-o B mm 

Similarly, for nonidentical components 

1-1 
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in the case or short mode system failure for identical components 

Qs=1? (A. 15) 

and for the non identical elements case 

M 

a-ru,, (a, i6) 



A.2 PARALLEL STRUCTURE 

A. 2.1 Reliability Expression 

Let the number of para! lei elements m = 2 

Thus 

The reliability terms with inspection From Table A I 

PP+q«P+Pq a =p 2 + 2Pq a (A. 1 7) 

Since P~ 1 -q,-q^ theref ore 

* = 0-O 2 -?/ (A.18) 

At m = i 
Thus 

P 3 + 3P% + 1P\ + + ip q ? + fij^ *«l + 3^ + . j 

By inspecting Table A2, system reliability 

=P i + *qiP+3P a k (A. 19) 

Replace P with (A .7); therefore 

^-(l-?,) S -^ (A.20) 
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With the aid of (A. 18} and (A.20), the general system reliability formula 
for identical elements connected in the parallel configuration becomes as 
follows: 

The above equations for non identical elements may be rewritten as 

*=ri(i-<u-TU, < A22 ) 

A. 2,2 Failure Probabilities 
For m = 2. Collected short-mode failure terms from Table Al 

- q,P+qj 0 + q,q, + + =1?+2 P«i + 2 <M* 

Thus: 

..Q,~\-{\-q,f < A23 ) 
Similarly, for open-mode failure 

( A24 > 

Al m - 3, Short-mode system failure terms from Table A3 

- qj + IPqj + 6Pq„q l + 3q t ll + 2q e q\ + 3? j9 * + Zq 2 ,q a 

Thus 

2,= ' -(!-<?,)* (A.25) 

Similarly, in the case of open mode failure 

Q 0 = q\ (A<26) 

As seen from (A.23) and (A,25), the short-mode failure equation for 
identical elements can be generalized as follows: 

<?,-i-(i-?,r ( A27 > 

Similarly, for the nonidenlical elements case 

e,= i- 5 (!-«„■> ( A - 2g ) 
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The corresponding generalized open mode system failure probability 

Q B ~4? identical components (A.29) 

and 



Q v - TT <7 UI nonidentical components (A. 30) 

i - ] 



